


Indian Agricultural 
Rmearch Institute. New Delhi 


I.A.R 1.6. 

OIP NUC—K 3 l.VR.I. tO-S 55- 15.900 





THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

(founded bt h. c. carver) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XIX 


1948 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


M. S. BARTLETT 
WILLIAM G. COCHRAN 
ALLEN T. CRAIG 
C. C. CRAIG 


EDITED BY 
S. S. WILKS, Editor 

HARALD CRAMfiR 
W. EDWARDS DEMING 
J. L. DOOB 
W. FELLER 
HAROLD HOTELLING 


J. NEYMAN 

WALTER A. SHEWHART 
JOHN W. TUKEY 
A. WALD 


T. W. Anderson, Jr. 
David Blackwell 
J. H. Curtiss 
J. F. Daly 
Harold F. Dodge 
Paul S. Dwyer 


WITH THE COOPERATION OP 

Churchill Eisenhart 
M. A. Gibshick 
Paul R. Halmos 
Paul G. Hoel 
Mark Kac 
E. L. Lehmann 
William G. Madow 


H. B Mann 
Alexander M. Mood 
Frederick Mosteller 
II. E. Robbins 
Henry Scheff6 
Jacob Wolfowitz 


The Annals of Mathematical Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 
Md. Subscriptions, renewals, orders for back numbers and other business com¬ 
munications should be sent to the Annals of Mathematical Statistics, Mt. 
Royal & Guilford Aves, Baltimore 2, Md., or to the Secretary of the Insti¬ 
tute of Mathematical Statistics, P. S. Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 

Changes in mailing address which are to become effective for a given 
issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. 

Manuscripts for publication in the Annals of Mathematical Statistics 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscript 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and wheneve; 
possible replaced by a bibliography at the end of the paper; formulae in foot¬ 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 

Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 

The subscription price for the Annals is $8.00 inside the Western Hemi¬ 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available 
at $8.00 per volume or $3.00 per single issue. 


Composed and Printed at the 
WAVERLY PRESS, Inc. 
Baltimore, Md., U. S. A. 


Entered m aecond-claae matter at the Poet Office at Baltimore, Maryland, under the act of March 3, 1879 



CONTENTS OF VOLUME XIX 

Articles 


Banerjee, K. S. Weighing Designs and Balanced Incomplete Blocks 394 

Bliss, C. I., and Cochran, W. G. Discriminant Functions with Covariance 151 
Brennan, J. F., and Housner, G. W. The Estimation of Linear Trends 380 
Camp, Burton H. Generalization to N Dimensions of Inequalities of the 
Tchebycheff Type .. .. 568 

Cochran, W. G., and Bliss, C. L Discriminant Functions with Covariance 151 
Dw\er, P. S., and MacPhail, M. S. Symbolic Matrix Derivatives 517 

Epstein, Benjamin. Some Applications of the Mellin Transform in 
Statistics .. .. 370 

Feller, W. On the Kolmogorov-Smirnov Limit Theorems for Empirical 
Distributions 177 

Gurland, John. Inversion Formulae for the Distribution of Ratios 228 

Harris, T. E. Branching Processes 474 

Hartman, Philip, and Wintner, Aurll. On the Effect of Decimal Cor¬ 
relation on Errors of Observation 389 

Herbach, Leon II. Bounds for Some Functions Used in Sequentially 
Testing the Mean of a Poisson Distribution UK) 

Hoeffding, Wassily . A Class of Statistics with Asymptotically Normal 
Distributions 293 

Hoeffding, Wassily . A Non-Parametric Test of Independence 546 

Hoel, Paul G. On the Uniqueness of Similar Regions .. 66 

Housner, G. W., and Brennan, J. F. The Estimation of Linear Trends 380 
JC\c, M. On the Characteristic Functions of the Distributions of Estimates 
of Various Deviations in Samples from a Normal Population 257 

Kempthorne, O. The Factorial Approach to the Weighing Problem 238 

Kendall, David G. On the Generalized “Birth-and-Death” Process 1 

Kincaid, W. M. Solution of Equations by Interpolation 207 

Lehmann, E. L., and Stein, C. Most Powerful Tests of Composite Hy¬ 
potheses I. Normal Distributions 495 

Lotka, Alfred J. Application of Recurrent Series in Renew al Theory 190 
Madow, William G. On a Source of Downward Bias in the Analysis of 
Variance and Covariance .. .. . 351 

Madow, William G. On the Limiting Distributions of Estimates Based on 
Samples from Finite Universe .. .. .335 

MacPhail, M. S., and Dw\er, P. S. Symbolic Matrix Derivatives 517 

Mosteller, Frederick. A A;-Sample Slippage Test for an Extreme Popu¬ 
lation .. 58 

Nanda, D. N. Distribution of a Root of a Determinantal Equation 47 

Nanda, D. N. Limiting Distribution of a Root of a Determinantal Equation 340 
Plackett, R. L. Boundaries of Minimum Size in Binomial Sampling 575 

ni 



IV 


CONTENTS OF VOLUME XIX 


Richards, Paul I. Probability of Coincidence for r l wo Periodically Recur¬ 
ring Events • 16 

Rohrins, Herbert. Mixture of Distributions 300 

Silber, Jack. Multiple Sampling for Variables 240 

Stein, C., and Lehm\nn, E. L. Most Powerful Tests of Composite Hy¬ 
potheses I. Normal Distributions . . 493 

TuKE’i, John W. Non-Parametric Estimation III. Statistically Equivalent 

Blocks and Multivariate Tolerance Regions—The Discontinuous Case 30 
Votaw, D. F., Jr. Testing Compound Symmetry in a Normal Multi¬ 
variate Distribution . 417 

W\ld, Abr\h\m. Asymptotic Properties of the Maximum Likelihood Esti¬ 
mate of an Unknown Parameter of a Discrete Stochastic Process 40 

Wald, Abr\ham. Estimation of a Parameter when the Number of Un¬ 
known Parameters Increases Indefinitely w r ith the Number of Observa¬ 
tions 220 

Wald, Abr\ii\m, and Wolfowitz, J. Optimum Character of the Sequen¬ 
tial Probability Ratio Test 320 

Wintner, Aurel, and H\rtman, Philip. On the Effect of Decimal Cor¬ 
relation on Errors of Observ ation 389 

Wold, Herman O. A. On Prediction in Stationary Time Series .. 558 

Wolfowitz, J. and W\ld, Abraii\m. Optimum Character of the Sequen¬ 
tial Probability Ratio Test . 320 

Notes 

Albert, G. E. Correction to “A Note on the Fundamental Identity of 
Sequential Analysis” 420 

Aroixn, Leo A. The Fourth Degree Exponential Distribution Function. 589 
Bacon, H. M. A Matrix Arising in Correlation Theory 422 

Birnbaum, Z. W. On Random Variables with Comparable Peakedness 70 

Chung, Kai Lai. On a Lemma by Kolmogoroff 88 

Cramer, G. F. An Approximation to the Binomial Summation 592 

Dixon, W. J. Table of Normal Probabilities for Intervals of Various 
Lengths and Locations.. 421 

Guttman, Louis. A Distribution-Free Confidence Interval for the Mean 410 
Guttman, Louis. An Inequality for Kurtosis 277 

Horton, H. Burke. A Method for Obtaining Random Numbers 81 

Hsu, L. C. Note on an Asymptotic Expansion of the nth Difference of Zero 273 
Jones, Howard L. Exact Lower Moments of Order Statistics in Small 
Samples from a Normal Distribution 270 

Kincaid, W. M. Note on the Error in Interpolation of a Function of Two 

Independent Variables 85 

Kullback, S. Correction to “On the Charlier Type B Series” 427 

Maceda, E. Cansado. On the Compound and Generalized Poisson Distri¬ 
butions 414 



CONTENTS OP VOLUME XIX 


V 


Marks, Eli S. A Lower Bound for the Expected Travel Among m Random 


Points ... 419 

MuRPm, R. B. Non-Parametric Tolerance Limits 581 

Noether, Gottfried E. On Confidence Limits for Quantiles 110 

Rasch, G. A Functional Equation for Wishart’s Distribution 262 

Robbins, Herbert. Convergence of Distributions 72 

Robbins, Herbert. The Distribution of a Definite Quadratic Form 266 
Robbins, Herbert. The Distribution of Student’s t when the Population 

Means are Unequal . 106 

Smirnov, N. Table for Estimating the Goodness of Fit of Empirical Distri¬ 
butions .. 279 

Tukey, John W. Approximate Weights 91 

Walsh, John E. On the Use of the Non-Central ^-Distribution for Com¬ 
paring Percentage Points of Normal Populations 93 

Miscellaneous 

Abstracts of Papers . 116, 128, 595 

Adoption of the New Constitution 607 

Book Reviews 122, 282, 436 

Constitution and By-Laws of the institute 115 

News and Notices 127, 285, 438, 603 

Report of the Editor . . 144 

Report of the Institute Committee on Teaching of Statistics 95 

Report of the President of the Institute 137 

Report of the Secretary-Treasurer of the Institute 141 

Report on the Berkeley Meeting of the Institute 132, 444 

Report on the Chicago Meeting of the Institute 136 

Report on the Madison Meeting of the Institute 608 

Report on the New York Meetings of the Institute 133, 290 




ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 

By David G. Kendall 
Magdalen College, Oxford 

1 . Introduction and Summary. The importance of stochastic processes in 
relation to problems of population growth was pointed out by W. Feller [1] 
in 1939. He considered among other examples the “birth-and-death” process 
in which the expected birth and death rates (per head of population per unit of 
time) were constants, Xo and mo , say. In this paper I shall give the complete 
solution of the equations governing the generalised birth-and-death process 
in which the birth and death rates \(t) and n(t) may be any specified functions 
of the time t. The mathematical method employed starts from M. S. Bartlett’s 
idea of replacing the differential-difference equations for the distribution of the 
population size by a partial differential equation for its generating function. For 
an account of this technique, 1 reference may be made to Bartlett’s North Caro¬ 
lina lectures [2]. 

The formulae obtained lead to an expression for the probability of the ultimate 
extinction of the population, and to the necessary and sufficient condition for a 
birth-and-death process to be of “transient” type. For transient processes 
the distribution of the cumulative population is also considered, but here in 
general it is not found possible to do more than evaluate its mean and variance 
as functions of t , although a complete solution (including the determination of 
the asymptotic form of the distribution as t tends to infinity) is obtained for the 
simple process in which the birth and death rates are independent of the time. 

It is shown that a birth-and-death process can be constructed to give an 
expected population size n t which is any desired function of the time t , and among 
the many possible solutions the unique one is determined which makes the 
fluctuation, Var(n*), a minimum for all t. 

The general theory is illustrated with reference to two examples. The first 
of these is the (X 0 , mi 0 process introduced by N. Arley [3] in his study of the 
cascade showers associated with cosmic radiation; here the birth rate is constant 
and the death rate is a constant multiple of the “age”, t, of the process. The 
/?rcurve is then Gaussian in form, and the process is always of transient type. 

The second example is provided by the family of “periodic” processes, in 
which the birth and death rates are periodic functions of the time t. These 
appear well adapted to describe the response of population growth (or epidemic 
spread) to the influence of the seasons. 

2. The formulation and solution of the equations for the general (X, m) process. 
Let the integer-valued time-dependent random variable n t measure at time t the 

1 It appears from some remarks by Arley and Borchsenius [5] that the generating func¬ 
tion method was first employed in problems of this kind by Dr. C. Palm. 



2 


DAVID G. KENDALL 


size of a population, and suppose that in an element of time dt the only possible 
transitions (and their associated probabilities) are: 

nt+dt = -f 1, A (t)n4t + o(dt); 

(1) nt+dt = n ti 1 — (A(*) + n(t)\n4t + o(dt); 

nt+it = »« — 1, n(t)n t dt + o(dt)> 

As an initial condition it will be supposed that the population is descended from 
a single “ancestor”, so that n 0 = 1, and thus 

(2) PM = 1, PM - 0 (n * 1). 

It then follows that the P n (t) must satisfy the differential-difference equations 

(3) | P»«) - (» + 1)mP.«(0 + (» - 1)XP,-1«) ~ «(X + m)P«(0, n> l, 
and 

(4) | Po(0 = iiPid) 

(where for convenience of writing I have ceased to indicate explicitly the de¬ 
pendence of A and m on the time). If P n (t) is defined to be zero when n < 0, 
the first of the above equations will then be true for all n, and accordingly the 
generating function 

(5) V (Z, t) m E P,0)2" 

n——oo 

must satisfy the linear partial differential equation 

the problem is to find the solution to this equation when it is coupled with the 
boundary condition <p(z , 0) = z. 

The equation (6) is of Lagrange's type, and can be solved in the usual manner. 
The auxiliary equation is 

(7) ~ - /i + (A + n)z — Xz 2 , 

and while in particular examples it might be convenient to attack this equation 
directly, progress in general is more easily made by observing that (7) is of 
Riccati’s form, for which a general theory is available. 2 The fundamental 
property of a Riccati equation is that the general solution is a homographic 


1 See, for example, G. N. Watson [4], pp. 93-94. 



ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


3 


function of the constant of integration, so that 


and equally 


fi + Cf t 
ft + Cf* ’ 


- fi 


where fi , f 2 , U and fi are all functions of the time t. Thus the general solution 
of (6) is of the form 




and from the boundary condition <p(z, 0) = z it then follows that 

u _ si(t) + mit) 

11 - m+m- 

On expansion, one obtains 

( 8 ) Po(t) = *, and P n (t) = {1 - Po(t) |(1 - 17 ,)^ (n > 1), 

where £ t and rj t are functions of the time t. Thus, for the general (X, n) process , 
the population size at any time is distributed in a geometric series with a modified 
zero term 

The next stage of the solution is to determine the functions £ t and rj t . From 


<p(z, t) = 


« + (1 - « - l)* 


and if this expression for <p be substituted in (6) it will be found* that 
(i?£' - £ 1 ?') + i?' = X(1 - £)(1 - i?), 


£' = 1 - $)(1 - i ?). 

Now let U = 1 — £ and V = 1 — rj, so that 

U'/U = - mF, 
and 

r = (m - X)7 - mF 2 . 

The last equation is of Bernoulli’s type and can be solved by writing 

W = 1/7, 


1 Here £' « d£/dt 9 etc. 



4 


DAVID G. KIAD\LL 


so that 

TP + (m - A)ir = M. 

Initially £ =% = (), and U = V = W r = 1; the solution of the 17-equation is 
therefore 


IF = e~'/l + f e fM n( T )dT j, 


where the function p is defined by 


p(t) = [ {p(t) — X(t) J<ir. 
Jo 


Integration by parts gives two other formulae for W which will prove useful; 
they are 


W = 1 + e“ 


J c p(T) X(r)dT 


(10c) 17 = 4(1 + e-') + K" f c r(r) jX(r) + M (r)}dr. 

Jo 

The quantities 1/ and 7, and hence also £ and rj can now be expressed in terms of 
p and W, for 




17 17 


and so 


= 1 “ W and v ‘ = 1 “ W’ 


These results, together with (8), suffice to determine completely the P n (t) as 
functions of the time t. 

It is easy to deduce formulae for the mean and variance of n t (these could also 
be obtained directly from (6)). For the mean, 


i " _ -p(o 

l - m 


while for the variance, 


Var (n>) = I 1 - , ^ ^ - e~ f (2W - 1 - e~ p ) 

(1 - i\Y 

= (V T, {X(r) + n ( r )\ dT . 


(14c) 



ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


5 


Alternatively, using the other forms for W, one can write 


(14a) 

Var ( n t ) — e p < 

{V' -1 + 2 1* J e' (T V(r)dr j 

(14b) 


1 - t~* + 2e~’ jf‘ ^ (T) X(T)dr| 


If the initial population n 0 = N > 1 , these formulae for n t and Var(n*) are to 
be multiplied by N. 

It is now a simple matter to apply these formulae to the Arley (X 0 , piO proc¬ 
ess. It will be found that 

P — 2 'Pl/ ““ X<)£. 

and 

W = 1 4- X 0 e“ i '“ 1,+Xo ' [‘ e il “ T ’- x ° r dr. 

Jo 

The mean growth of the process therefore follows the Gaussian law 

fit - , 

while for the variance (using (146), since X is a constant) one finds 

Var ( n t ) =n f (l — n t ) + 2X 0 n t [ e w *~ x ° r dr, 

Jo 

in agreement with Arley [3] and Bartlett [ 2 ]. The distribution of n t at time t 
follows on inserting the above values of p and W into ( 8 ) and (12). 

3. The chances of extinction. The simplest special case is that in which 
(X, p) have the constant values (Xo, no ); this is the process introduced by Feller 
[ 1 ] and later discussed by several writers . 4 The formulae (13) and (14c) give 
at once the results 

(15) fit = and Var (n«) = n,(n, - 1 ), 

AO “ PO 

due to Feller, while since 

w = \ ofi t - po 

Xo ““ Po ’ 

equations ( 8 ) and ( 12 ) give 

(16) Pott) - fffe ~ 1} and P.(t) = {1 - P ,«)}(1 - Vthr' (» ^ 1 ), 

Aon* — po 

4 See Arley [3], Arley and Borehsenius [5], Bartlett [2] and Kendall [6]. Palm's formulae 
(10) are stated without proof by Arley and Borehsenius, but it appears from their remarks 
that he used a generating function method probably identical with that later employed by 
Bartlett and myself. 



6 


DAVID 0. KFXDALL 


where 


vt = -P„( 0 = 

Mo 


\q(Ui 1 ) 

Xo fit Mo 


These formulae were first given by C. Palm. 8 They actually hold only if 
Xo s* mo ; in the case of equality, W = 1 + X 0 £, and then 


= 1, Var(?t t ) = 2Xo^, 


(17) P,(<) = and P.(t) = {1 - P.(0}(1 - ^ 1), 

1 ■+■ Ao6 


where nt — Po{t). 

One particularly interesting point is that 

PoW —► 1 as t —> oo if Xo /i a , 


so that the population is “almost certain” to die out, even though in the critical 
case (Xo = mo ) the expected population size n t has a constant value. The same is 
true for any initial size of population; the new expression for P 0 (t) is then simply 
equal to the former one raised to the power n 0 = N , and therefore tends to unity 
as before. This phenomenon of extinction was first noticed in a similar problem 6 
by Francis Galton and H. W. Watson; an account of their work is given in Ap¬ 
pendix F of Galton’s book [7]. 

The formulae of the last section now make possible a discussion of the chances 
of extinction for the general (X, /x) process. When no = 1, 


(18) 



and so the necessary and sufficient condition for the ultimate extinction of the popu¬ 
lation is that the integral 


(19) 



should he divergent. 

It will be noticed that the integrand of (19) is non-negative, and so the in¬ 
tegral must either diverge to plus infinity, or have a finite value. Hence in any 
case the population always has a definite chance of extinction , given hy 1/(1 + I). 
For a population descended from N initial ancestors, the P n (t) are generated 
by the function 


( 20 ) 


{ 


j + (1 - ( - y)« 

1 — V)2 



• The extinction of family-names. Further references will be found in my paper [6]. 



ON THE GENERALIZED * * BIRTH-AND-DE ATH * * PROCESS 


7 


so that 


pm = 

and the chance of ultimate extinction is 


( 21 ; 



which i> or is not equal to unity for all N indifferently. 

Extinction is impossible, in (he sense of being an event of zero probability, if 
and only if p is identically zero, so that the process is one of reproduction only. 
Tt is also worth noting that a necessary but not sufficient condition for almost 
certain extinction is the divergence of the integral 


( 22 ) 



For if (22) had a finite value, pi/) would be bounded for all /, and so (19) could 
not be divergent. In general, when I = go and the population is almost cer¬ 
tainly doomed to extinction, I shall speak of the process as transient. 

For a transient process it is of interest to consider the random variable T, 
defined to be the “age” of the process at the moment of extinction. Since 


Po(t) = Probability {T < t} 9 
the probability distribution of T is Po(T)dl\ or 


(23) 


e piT) p(T)dT 
f r T 

1 + / e p r p(r)dr 
Jo 


0 < T < °c. 


For example, in the simplest birth-and-death process, when A and m are equal 
constants, the distribution of T is 


(24) 


Ao dT 

(l + W’ 


0 < T < oo. 


This is for an initial population no =» 1; more generally, when 
distribution of T is 


N > 1 , the 


XPl{T)[P,£T)} N - l dT. 


The median life-time T m is determined by the relation 

(25) F’V'VWdr = 1. 

Jo 


For the simple process, T n = 1/Xo when X 0 =■ mo , and more generally 


(26) 



(Xo 9* Mo) 



8 


DAVID G. KENDALL 


if no = 1. When n Q = N > 1 , the formula for T m becomes 

(27) jf>> mM*- 1/(2" 

For the balanced process (Ao, Ao) it therefore follows that 

(28) T m (N) = T n { l)/(2* w - 1 ) ~ 1.44 JV T m (l), 


as iV tends to infinity. If the process is unbalanced, however, so that Ao < mo > 
this asymptotic proportionality to N does not hold, and instead 


(29) 


T 

■* m 


1 


Mo — Ao 


log 


2 V %„ - X„\ 

(2•» - 1)moJ 


as JV tends to infinity. 


log jV 
Mo ““ Ao 


4. The cumulative population. There is associated with a birth-and-death 
process another random variable, M t , which is of importance in some applica¬ 
tions. This is defined as follows: initially Mo = no , while for t > 0 , M t shares 
all the positive jumps of n t . 

For example, if n t represents the number of cases of a disease in a population 
at time l, M t will be the total number of cases which have been recorded up to 
that time. If the process is transient, so that the epidemic is almost certainly 
extinguished in the course of time, M* will then be a measure of its overall 
severity. 

Again, if n t represents the viable count of a population of bacteria 6 with a birth 
rate A(/) and a death rate M t will be equal to the total count in which living 
and dead organisms are not distinguished. 

In order to discuss the joint variation of n t and M t it is necessary to introduce 
the new generating function 

( 30 ) w,t)-£i P n ,M{t)z n w M . 

n-0 Mm. 0 

Here the P n ,it(t) give the joint frequency-distribution of n t and M t at time t. 
By the usual argument the differential equation satisfied by the function ^ 
will be found to be 

(31) ^ = [\wz 2 - (X + m)* + m! 

and the associated boundary condition (if initially no = ilfo = 1 ) is 

(32) ^( 2 , w, 0) = zw. 

I have been unable to solve this equation for general A (t) and m(0 ; the solution 
when A and m are constants will be given in the next section. It is however 

9 For some general remarks about birth>and-death processes in relation to bacterial 
growth, reference may be made to my paper [6]. 



9 


ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


possible to find general expressions for the mean and variance of M t ; for this 
purpose it is more convenient 7 to work with the cumulant-generating function 

(33) K{u, v, t) = log 
This satisfies the differential equation 

(34) ~ = {X(e“ + ’ - 1) - M ( 1 - e -u )) 


and of course 

K = un, + I’M, + Jit 2 Var (n t ) 

(3 5) 

4- Jt 2 Var (M ,) + uv Cov {n ,, M,) + • • • . 

Expanding both sides of the equation in powers of « and v, and equating coeffi¬ 
cients, one obtains the differential equations 


(36) 

(37) 

(38) 


(39) 


and 

(40) 


Hi = (X — n)n,, 

y Var («,) = (X + n)n, + 2(X — n) Var (n,), 
al 

i M, = X«i, 

at 

( J, Var (M t ) = \n t + 2\ Cov (n t , \l t ), 


dt 


Cov ( n t , M,) = Xn 4 + X Var (rc 4 ) + (X — n) Cov ( n t , M t ). 


The solutions to the first two equations have of course already been given in 
section 2; from the third it follows that the mean value of M t is 

(41) M t = 1 + f c- pW \(r)dr. 

Jo 

The solution of the fifth equation is 

(42) Cov («,, M.) = n, j[' |l + X(r)dr, 

and so the variance of M t is 

(43) Var ( M t ) = [‘ \ft T + 2 Cov {rw, M t )\\(r)dr. 

Jo 


T Compare Bartlett [2]. 



10 


DAVID O. KENDALL 


In illustration of these formulae, consider first the Arley (Xo, mi t) process; from 
(41) 

(44) M, = 1 + Xo j‘ e Xo ' -i ’‘ lT, dr, 

but the complete expression for Var ( M t ) will be a multiple integral which does 
not appear to admit of much simplification. 

For the simple (X 0 , mo) process, however, when Xo < mo , it readily follows that 

(45) Mt = , 

Mo — Xo 

(46) Cov (n,, M t ) - 12 *, < - (1 - «,)} , 

Mo Xo { Mo ““ Ao J 

and 

/a , 7 \ \r\jf ^ _ ^o(mo “f“ Xo) f 1 ^ \ 4Xo Mo % » Xo(mo -f" Xo) 

(47) Var <*,) - (1 - «,) - g— jj-, + -g-^ (1 - «,). 

Thus in the limit, as t —> oo, the mean and variance of M m are 


(48) 




and Var (Af«) 


Mo 


Mo — Xo’ 

XqMo(^o + Mo) 
(Mo — X 0 )* 


the covariance of course tending to zero. If the process is balanced, so that 
Xo = mo and n t = 1, the integral for M t has the value 1 + Xo t y which increases 
without limit as t tends to infinity. This will always be so for a balanced process 
if the integral 

J X(r)dr 

is divergent. 

If the initial population no is equal to N > 1, and if all its members are counted 
into ilf 0 , the only modification necessary to the above formulae is that in each 
case the right-hand side is to be multiplied by N . 


6. The asymptotic distribution of the cumulative population for a simple 
transient birth-and-death process. The equation (31), which appears in the 
general case to be intractable even if one only requires the asymptotic distribu¬ 
tion determined by ^(1, w, oo), can be solved completely in the specially simple 
case when the birth and death rates \(t) and m(0 have the constant values Xo 
and mo • 

Let a and 0 be the roots of the quadratic 
(49) \qWz 2 — (Xo + mo )z + mo — 0, 



11 


ON THE GENERALIZED ‘ ‘BIRTH-AND-DEATH” PROCESS 


so chosen that 0 < a < 1 < 0; then the general solution of (31) will be found by 
the usual method to be 


* 


-*{j^ 


z 


r 


The boundary condition yp{z, w , 0) = zw therefore gives 


(50) 


t = 


( *(fi ~ jj) + flfr - q)e~ Xo1g( ^~~ a)t \ 

\ (£-*) + (* — a)e-Xot»0J-«)< /’ 


and it may be noted that if n 0 = M 0 = N > 1, this formula for ^ would have to 
be raised to the Nth power. It will suffice, however, to discuss the simplest 
case when n 0 = M 0 = 1. 

Let the process be transient, so that X 0 < mo ; then the asymptotic frequency 
distribution of M t when t —► ® is determined by the generating function 

(5i) 4(i,w, co ) = w« = * *± t ar , 

2Xq 


and here it is the positive square root which must be taken. The probability 
distribution of M* is thus 


(52) Qm 
where 

(53) 

The first few terms are 


Xo + Mo (2 M )! x M 

■~2XT 2 2M (M!) 2 2M -1 ’ 


4Xqmo 

(Xo + mo) 2 


(M -1,2,3, •••), 


(54) **',*“J, 

Xo r MO 

and it is easy to verify that the mean and variance of this distribution agree 
with the values given in the last section. When Xo = mo , x = 1, and then the 
terms in (54) fall off to zero like M~ m , M*> being infinite (in accordance with the 
remarks at the end of section 4). 


6. The determination of the process when its mean growth, fi t9 is given. 
Since n t = e~ p{t \ it follows that 

(65) X(0-m(<) “ J-log n„ 

and thus if n% is required to be a given function of the time, the birth and death 
rates must be chosen in accordance with (55); the only other condition is that 
for all t , \(i) > 0 and n(t) > 0. 

Arley has pointed out that the simple process (X(0 = c, m( 0 =* 0) gives a 
smaller fluctuation, Var (n<), than any other simple process with the same mean 



12 


DAVID G. KENDALL 


growth, say (Xo, mo) where Xo — mo = c. This suggests that one should consider 
the more general question: if fi t is given for all t, for which choice of the functions 
X(t) and /x(t) will the fluctuation Var (n t ) he a minimum? 

Suppose then that the whole region t > 0 consists of three sets of intervals, 
Ei , E 2 and E s , and that within an interval of the set E } , 

fit is a decreasing function if j = 1, 
fit is an increasing function if j = 2, 
and fit is a constant if j = 3. 

Then one can write 

Var (n,) = e~ 2f Ue fW ] + 2e~ 2p f e p(r) X(r)rfr 

Jei 

+ f>- 2 '2 2 [-V (r) ] + 2e“ 2p f e p(r V(r)dr 
+ e" 2p f 6 pU) {X(r) + M (r)!dr. 

Here the terms involving X and n explicitly are all non-negative, and so Var 
(n t ) will be a minimum for the (unique) choice of X and m which makes them all 
vanish, namely: 

in Ei , \(t) = 0 and n(t) => — fit/fit ; 

(56) in Ei , \{t) = fit/fit and n(t) =* 0; 
in Ei , X(0 = n(t) = 0. 

However, when one is looking for a (X, m) process with a given fi t function, 
this minimum-fluctuation solution would frequently be an artificial one. For 
example, suppose it is required that fit shall be a Gaussian curve, reducing to 
unity when t = 0; then 

(57) n, = e Xo ‘- > '“‘\ 

say, and X(0 — n(t) = Xo — mt\ the most natural solution is then the Arley proc¬ 
ess, 

X(0 = Xo, m( 0 = Mi t. 

It is of interest that a (X, m) process can be found for which the expected growth 
follows a logistic law, 


( 68 ) 


_ __ a 

n ‘ “ 1 + (a - 1)6-0' 


According to (55) one must have 


(« ~ 1)J9 
e* + (a — 1)' 


(a > 1, 5 > 0). 


Ht) - n(t) 



ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


13 


The minimum-fluctuation solution is thus the purely reproductive process 

<“> x <»- «.+T a 1 - 1) , 'W- 0 ' 

which satisfies the relation 

(60) X(t) ” (l - » 

as might have been expected, since the Verhulst-Pearl-Reed differential equa¬ 
tion (which forms the deterministic basis for the logistic law) is 



7. “Periodic” birth-and-death processes. As a further example of the general 
theory it is worth considering the “periodic” processes for which the expected 
growth fit is a function of the time which repeats itself with the period «. It 
will then follow that p(t) and so also X (t) — p(t) have the period «, while p(t) 
must be zero whenever t is an integer multiple of w. The only cases of interest 
are those in which X and p are separately periodic, and then it can be seen from 
(14c) that 

f 5 

(62) n — no and Yar (n) = kn 0 J e p(T) {X(r) + p{r)\dr, 

whenever t = ku, for every positive integer k. Thus, although the expected 
value of n* repeats itself regularly, in practice this “periodicity” would be ob¬ 
scured by the rapid increase, with increasing t, in the magnitude of the random 
fluctuations (as measured by Var (n*)). Moreover, since 

J e* iT) p(T)dr = k J e? {r) p(T)dr, 

it is clear that the process is necessarily transient, there being unit probability 
that n t will ultimately be reduced to zero. 

Periodic birth-and-death processes are likely to be of importance in biology; 
it should be pointed out, however, that this type of process describes the stochas¬ 
tic modification of a regular periodicity imposed on the model from outside, and 
it is not to be confused with other stochastic models which themselves generate 
irregular (non-phase-keeping) oscillations. The models discussed in this section 
are in fact suitable for the quantitative description of seasonal influences. 

Before going into further detail it is natural to specialise the model by assum¬ 
ing that the functions X and p are at most simply harmonic. If no = 1, and since 
there is to be no damping, one will then have 

(63) fl t - 


<« > 0 ), 



14 


DAVID G. KENDALL 


where v& - 2r, and a and e are amplitude and phase constants, respectively. 
The functions X and n are now to be determined from the relation 

X — p = av cos v(t + e), 

and this can be done in many ways. The minimum-fluctuation solution would 
here be artificial, and it is more natural to select two other solutions, 

(64) X = av{l + cos v(t + €)}, n = av, 
and 

(65) X = av, n » av{l — cos v{t + e)\, 

for further consideration. In the first of these the death rate is constant and 
the birth rate executes simple-harmonic oscillations, while in the second it is the 
birth rate which is constant, and the death rate which oscillates. It can be seen 
that, of all solutions of these two types, (64) and (65) are those with the least 
value for Var ( n t ). From formulae (14a) and (146) it will be found that, for 
either process, 

(66) Var (n) = ^koI Q (cc)e atiM when t = kih 

where Io(a ) is the Bessel function of zero order, of the first kind and of imaginary 
argument. (It will be noticed that, whenever t is an integer multiple of w, the 
distribution of the population size n t is the same for the two models.) For small 
oscillations, when t = kw, 

(67) Var (n) ~ 4 rka as a 0 

since 7o(0) = 1, while for large oscillations 

(68) Var (n) ~ 2k{2 TrcOV^mia as a —> x. 

(Here n m i n is the minimum value of n t .) 

The calculation of P 0 (w) presents some points of interest. For either model 
it proves to be 

(69) 2Tah(a)e** i '" 

1 + 2iraZo(a)e tt,inM ’ 

this is the probability that a population element, known to be descended from a 
single individual at time t = 0, will have become extinct one year later (if one 
identifies the oscillations with a seasonal effect). It will be seen that Po(«) 
will be least when sin vt = —1, and greatest when sin ve = +1; i.e. when 
fit is expected to have a minimum, or a maximum, at t = 0, respectively. Ac¬ 
cordingly it follows that the progeny of a new member of the population is most 
likely to survive till the following year if the “ancestor” commences its “mem¬ 
bership” at a time of year when the population would normally have its mini¬ 
mum value. 



ON THE GENERALIZED * * BIRTH-AND-DE ATH* * PROCESS 


15 


In conclusion, I wish to thank Professor M. S. Bartlett for many helpful dis¬ 
cussions on the subject of this paper. 

REFERENCES 

[1] W. Feller, “Die Grundlagen der Volterraschen Theorie des Kampfes urns Dasein in 

wahrscheinlichkeitstheoretischer Behandlung”, Acta Biotheoretica , Vol. 5 (1939), 
pp. 11-40. 

[2] M. S. Bartlett, Stochastic Processes (notes of a course given at the University of North 

Carolina in the Fall Quarter, 1946). It is understood that copies of these notes 
are available on request. 

[3] N. Arlet, On the Theory of Stochastic Processes and their Application to the Theory of 

Cosmic Radiation , G. E. C. Gads Forlag, Copenhagen, 1943, pp. 106-114. 

[4] G. N. Watson, The Theory of Bessel Functions , University Press, Cambridge, England, 

1944. 

[5] N. Arlet and V. Borchsenius, “On the theory of infinite systems of differential equa¬ 

tions and their application to the theory of stoohastic processes and the perturba¬ 
tion theory of quantum mechanics”, Acta Mathematica f Vol. 76 (1945), pp. 261* 
322 (esp. 298-9). 

[61 D. G. Kendall, “On some modes of population growth leading to R. A. Fisher's loga¬ 
rithmic series distribution”. To appear in Biometrika. 

[7] Francis Galton, Natural Inheritance , Macmillan, London, 1889. 



PROBABILITY OF COINCIDENCE FOR TWO PERIODICALLY 
RECURRING EVENTS 1 

By Paul I. Richards 
Brookhaven National Laboratory 

Su mm a r y. This paper contains a study of the following problem: Each of 
two events recurs with definitely known period and duration, while the starting 
time of each event is unknown. It is desired that, before the elapse of a certain 
time, the events occur simultaneously and that this “overlap” be of at least a 
given minimum duration. 

The probability of this satisfactory coincidence is first evaluated, and it is 
found that the solution, while mathematically adequate, is of no value for prac¬ 
tical application. This circumstance arises from the possibility that, with 
certain rational ratios of the periods, the events may “lock in step”. Accord¬ 
ingly, an attempt is made to smooth the probability function with respect to 
small variations in the ratio of the periods. Due to difficulties in manipulating 
the number-theoretic expressions involved, this smoothing is carried through 
only by the use of certain approximations. Moreover, because of these same 
difficulties, an averaged value of the probability itself is not obtained, but, in 
its stead, there is derived a formula for that fraction of randomly related repeated 
trials in which the original probability will be less than one-half. 

Thus, the original problem is not completely solved. The results obtained, 
however, do allow one to compare the relative advantages of different situations 
and to make a rough estimate of the likelihood of success. Generally speaking, 
the analysis is applicable whenever the ratio of “on time” to “off time” is small 
for each event. 

1. Introduction. Our problem may be represented schematically as follows: 
Consider two pulse waves (Fig. 1) of periods T x , T a , pulse widths t x , t %, and 
phases ,0*. It is desired that these pulses overlap at least once within a given 
time interval; moreover, an overlap is not satisfactory unless its duration is at 
least as great as some assigned t m . The starting phases <f>i and are unknown 
for both waves. Our problem, then, would appear to be to calculate as a function 
of time the probability of at least one overlap of duration at least t m . 

This probability will be calculated later, and, while mathematically adequate, 
is totally useless for practical application. This rather unusual occurrence in 
applied mathematics arises from sources generally kept in mind only by experi¬ 
mental physicists. Namely, the very nature of the science of measurement, 
involving as it alwa ys does at some stage, the use of the human senses, precludes 

x This work was done in part under Contract No. OEMsr-411 between Harvard Univer¬ 
sity and the Office of Scientific Research and Development, which assumes no responsibility 
for the accuracy of the statements contained herein. 

16 



PROBABILITY OF COINODENCE 


17 


the availability of mathematically exact values of the parameters of the problem. 
In other words, although experimental error can sometimes be made amazingly 
small, it can never be eliminated. 

Now, as might be expected from the possibility that the waves may “lock in 
step”, our probability is extremely erratic with respect to very minute changes 
in the periods T x , TV For example, let T x = 7* = 10% = 10% (f« = 0); a 
simple direct calculation then shows that, for all times greater than T x = 7*, the 
desired probability is 0.03. Now if we let T x = Ti + «, one wave will “creep up” 
on the other, and eventually (for times greater than T x T 2 /t) the probability is 
unity! Thus it may very well happen in a practical application that the param¬ 
eters are known to an accuracy essentially sufficient only to give the obvious 
result: 0 < P < 1. 



I 


t =o 

In the practical problem originally considered, uncertainty in the data arose 
not only from experimental error but also from slight instability of equipment. 
Thus some means of averaging over variations in the periods had to be found 
if the analysis was to be of any practical value whatsoever. 

For reasons which will appear in the later analysis, this smoothing entails 
difficulties which the author was unable to overcome with any great success; the 
nature of the results which have been obtained is discussed in the next section. 
These results involve several approximations which, generally speaking, are 
based on the assumption that the ratios U/Ti are both small. 

It might be noted finally that the obviously favorable situations t x > 7* or 
k> T\ often cannot be used because of numerous practical difficulties. 

2. Results. In this section, we shall summarize the results of the later 
analysis for the benefit of those readers not interested in the latter. At the end 
of this section, there is an outline of the practical application of the formulas. 



Fig. 1 



18 


PAUL I. RICHARDS 


We shall continue to use the notation already introduced: 
tiyk — durations of the events; 

Ti , T% = periods of the events; 

t m = minimum satisfactory duration of coincidence; and 
P = probability of at least one satisfactory coincidence. 

We shall also use the (at present) rather arbitrary notation: 

t = (time — t m ) 

(2) Po = «1 - tm)(h - Ud/TiTi2 
w - «1 + t 2 - 2t m )/T x Ti. 

The probability function for short time intervals is: 

(3) P = Po + wt, for f < Max(Tj, Ta). 

In any case: 

(4) P < Po + wt. 

As already explained, the functional dependence of P for large t is of no prac¬ 
tical use due to its extremely erratic variation with small changes in the periods 

Ti, U 

For reasons which will later become apparent, the only type of averaging which 
has yet been carried to completion is the following. Consider that many trials 
of equal length are made and that in each individual trial, all the parameters 
are, by some mysterious device, held constant with absolute, mathematical 
exactitude. Assume for definiteness that T 2 < T x . Between different trials, 
let U and T 2 vary in such a way that T x /T 2 takes all values within a range of 
\ with equal probability. (In the original problem, the ratios U/Ti necessarily 
remained constant.) The quantity / given below then represents that fraction 
of the trials in which the rigorous probability is less than an assigned value = 
Po + Q. Thus the smaller f is, the greater are the chances of success. 

It must be admitted that this method assumes several things which are not 
true in practice. First, the parameters of the problem probably vary by at 
least a percent even within a single trial. More serious, the required variation 
in T 1 /T 2 may, in the extreme case Ti = T 2) demand as much as 33% variation 
in 2*. While considerable variation does occur, it is doubtful that it attains 
this magnitude. Finally, the method assumes that T\ stays fixed as T t varies, 
whereas actually T x and vary simultaneously. 

Despite these drawbacks, it was felt that the results were meaningful for the 
practical problem. In any case, they must serve until a more adequate analysis 
can be carried through. 

The reader will notice that the final results have the form of a “probability of a 



PROBABILITY OF COINCIDENCE 


19 - 


probability”. It would thus seem that a simple integration would yield a true 
probability, but, unfortunately, the formulas for / are reasonably accurate only 
for Q < £. The final formula for f = fraction of trials in which P < P 0 + Q is: 

1 for tw < Q y 

1.216 Q jl + (/q~ l)log(l “£)}> for tw > Q, Q < 1/2. 

This expression is subject to error from several sources. First it is an approxi¬ 
mation to a number-theoretic formula given in (31); this approximation is best 
for t and Q/w large compared to Max(Ti, T*). A completely general comparison^ 
of (31) and (5) = (33) is given in Fig. 2, where the agreement will be seen to be* 
quite adequate even for relatively small t and Q/w. (The dotted contours are 
straight lines passing through the origin.) When t and Q/w are small this first 
source of error can be eliminated by using the solid contours of Fig. 2 in place 
of (5). 

Secondly, formula (31) itself is an approximation and involves the use of 
simplified probability formulas and an assumption that Pq and w are constant 
as T 2 varies. The maximum possible magnitude of these errors in (31) is given 
by (parentheses indicate functional dependence): 

(6) f(ti Z>, Q - po - q) < f(tw, Q) <m,Q + po + q), 
where, as T j varies, 

Wj W = minimum, maximum values of w 
po 38 change in P 0 
q = maximum value of w*TiT%. 

Generally speaking, these errors are small if U/Ti are small and if t is large com- 
pared to Max(Ti, T 2 ). Also, there is considerable possibility that certain errors 
will cancel in such a way as to make (6) correct with q = 0. 

We shall now outline the practical use of these results. Given nominal values 
of the parameters defined in (1), choose a convenient value for Q < i (usually 
Q = J), and substitute into (2) to find tw/Q . From (5), one may then determine 
/ = fraction of trials in which P < Pq + Q. (Low values of / are thus desirable.) 
For computational convenience, (5) has been plotted in Fig. 3, while, above the 
range of Fig. 3, the following lies within 1% of (5). 

(7) / = 0.608(<2Vta;) for tw > 10Q. 

Note also that (4) may often be of considerable use in quickly eliminating cases 
of very poor probability, and recall also that (3) will give the true, directly mean¬ 
ingful probability whenever t is no greater than Max(2\ , T*). 

Evaluation of the maximum possible error in / as so obtained is more com¬ 
plicated. If t and Q/w are small, Fig. 2 may be used to eliminate inexactness 



PAUL I* BICHABDS 


120 

due to the approximation of (31) by (5) = (33). Otherwise, this error may 
safely be assumed to be negligible (less than 0.025; (31) may be employed di¬ 
rectly, but this is laborious unless Q/w is small). The remaining errors, given 
by (6), may change depending on how T% is assumed to vary. To make these 
bounds as close as possible, it is best to choose T 2 = Min( T \, T z ) and then let 



T* decrease from its nominal value by an amount sufficient to cause T x /T\ to 
increase by J. 

The reader may have noticed that f has a jump discontinuity as i passes 
through the value Q/w. This is not the result of approximations; it occurs also 
in the number-theoretic formula (excepting only when Max(7 7 l , Tj) = \w and 
Q ■ i) and merely means that the “lock in” phenomena are suddenly able to 
have an effect when t becomes greater than Q/w. 





PROBABILITY OF COINCIDENCE 


21 


3. The probability function. Our problem has already been represented by 
the pulse waves of Fig. 1. The starting phases fa , fa of the waves are random, 
and we desire the probability P of at least one overlap of duration at least t m 
within a given time interval. Manifestly P = 0 until time hence we shall 
give t the meaning already assigned in (2). 

Consider any sub-interval of width t m . The range of phases favorable to 
satisfactory coincidence on this interval is easily seen to be a rectangle with 
sides (t\ — 0 > (k — tm) in the phase plane (fa , fa). By proper choice of the 
(arbitrary) zero-phase reference, the small rectangle favorable to coincidence on 
(0, Un) can be made to fall in the lower left comer of the phase plane (Fig. 4). 



As we allow the sub-interval (width tm) to advance in time, this s m al l rectangle 
will sweep out along a 45° line (Fig. 4); its horizontal displacement = vert. disp. 
is given by t as defined in (2). Since the phases must be measured modulo the 
periods, we must “switch back” the strip whenever it begins to leave the large 
rectangle: 0 < fa < Ti , 0 < fa < T 2 ; this is illustrated in Fig. 5. 

The desired probability is then the area covered at least once by the strip 
divided by (Til*), the total available area of the phase plane. 

Using Fig. 4, one can easily show that, before the strip begins to overlap itself: 

(8) P = P 0 + wt , 

where t 9 P 0 , v> are defined in (2). 

A rectangle with opposite sides identified, as in Fig. 5, is topologically equiva¬ 
lent to a torus. This gives a good geometric picture of the overlap phenomena. 




22 


PAUL I. RICHARDS 


The strip winds diagonally about the torus until eventually (in general after 
several full circuits) it strikes sufficiently near its starting point to overlap itself 
on one edge. It then begins to fill the chinks between the previous circuits, and 
this single overlap continues until the chinks are almost filled. The strip then 
approaches its starting point from the side opposite to that on which single 
overlap occurred. Thereafter, only the center section of the strip is effective in 
increasing the area covered. This double overlap continues until the entire 
torus has been covered. A degenerate case is possible in which the strip, upon 
its first overlap, begins to retrace exactly its former path and the torus is never 
fully covered. This corresponds to interlocking of the original waves of Fig. 1. 

A rigorous proof of the above statements may be constructed by using the 
fact that each change in behavior can occur only at the starting point. In this 
manner, it is easily shown that: (a) single and double overlap occur in that order, 

A 

* 


Fig. 4 Fig. 5 

(b) the strip area effective in covering changes only upon a change in the type of 
overlap, and (c) the two types of overlap must occur on opposite sides of the 
starting point. 

The facts (a, b, c) may then be used to derive the probability function. For 
the analytic analysis, it is best to return to the (fa , fa) plane. Overlap of any 
type will first occur when the “unswitched-back” strip approaches sufficiently 
near a point ( niTi , where n x and th are non-negative integers not both zero. 
The analysis is greatly shortened by noticing that the behavior is completely 
determined by the distance of the line fa = fa from such points (even though 
the strip is not centered on this line), while the width of the strip is (Fig. 4) 
wT\T%/\/2. 

A slight fine-structure may arise in the probability function where it changes 
slope, depending on whether or not the leading comer of the moving rectangle 
strikes one of the sides of the original small rectangle. These effects are small 
if U/Ti are small and will be neglected below by supposing the strip to be gen- 




PROBABILITY OF COINCIDENCE 


23 


erated by a line segment oriented perpendicularly to its path. The error arising 
from this procedure consists essentially in a delay or advance in the time at 
which P changes slope. It may be seen that the maximum effect represents a 
delay of At = wTiTt/2. The error introduced is then less than Aty/2 multiplied 
by that portion of the total width of the strip which becomes ineffective due to 
the overlap considered. The sum of these effects must be less than that given 
by using the total width of the strip; this gives the maximum error wT\Til2. 

The results of the method outlined are then as follows. Single overlap occurs 
at t = s where 

(9) 8 = 

and (mi, ms) is that pair of non-negative integers not both zero such that s is a 
minimum and 


( 10 ) 


Vi 


mi _ ms 

27 T 


Double overlap occurs at t = d, where 

(11) d = i(niT\ + nsTY), 


and (ni, n 2 ) is that pair of non-negative integers not both zero such that d is a 
minimum and the conditions 


( 12 ) 


are satisfied. 
(13) 


Wi 

\T t 

n 2 1 
Yi 1 

< w, 



fni 


f mi 


\ < 0, 

\Tt 

- 

tJ 

u. 

Ti) 

p 2 = 

= pi + 

»x 

Tt 

n t 

Ti 

- w, 


the probability function is then 
= Po + wt 


for t < 8, 


(14) P = Po + sw + (t — s)pi for 8 < t < d, 

= Po + 8W + (d — 8)pi + (t — d)p% for d < t, 

where it is understood that P = 1 if (14) gives P > 1. 

The degenerate case where the waves interlock is given correctly by this for¬ 
malism. Namely, if the strip starts to retrace its path exactly, then pi = 0 
and the second part of (12) shows that d does not exist. Equation (14) then 
gives the correct result: P rises to the value Po + sw and never increases further. 


4. The method of smoothing. We have already discussed in section 1 the 
inadequacy of the formal mathematical solution (14) for purposes of practical 



24 


PAUL I. RICHARDS 


application. Either mathematical analysis or intuitive consideration of inter¬ 
lock shows that the erratic behavior of P is due almost entirely to small changes 
in the ratio TV TV As this ratio passes through certain rational values, possi¬ 
bilities of interlock appear and disappear. Consequently, we next alter (14) 
to a form in which the dependence on this ratio is more evident. 

We may, without loss of generality, assume: 

(15) T x = 1, Tt < 1. 

Also introduce the standard notation: 

(16) [x] = (largest integer < x). 

It will then be seen that (10) and (12) may be thrown into the form: 1 

(17) k = smallest positive integer such that pi — | ke — i | <w(i — integer); 

(18) K — smallest positive integer such that | Ke — 1 1 < w and also 

(ke — i) (Ke — I) < 0 (I = integer), 

where either 

U9) e= Y t ~ [rj]’ or e = 1 + [ji\ - y t ■ 

Now from (9) and (10), we note that s differs frommiTi by at most wT\Tt/2, 
while from (11) and (12), d differs from riiTi by less than the same amount. 
Moreover, by the second half of (12), d is thereby made too small if s has been 
made too large and vice versa. Hence the use of these approximations in (14) 
will contribute an error certainly less than w 7 TiT t /2. Adding the error dis¬ 
cussed in section 3, the total introduced thus far cannot exceed w 2 T\T %. 

We thus use in the present notation s = k, d = K; (13) and (14) then become: 


(20) 

Pi = Pi + | Ke — 1 1 — w 



(a) P = Po + wt, 

for t < k 

(21) 

(b) P = Po + kw + (t - k)p x , 

for k < t < K 


(c) P = Po + kw + (K - k)pi + (( - K)pt, 

for K <t 


where, as before, P = 1 if (21) gives a value greater than unity. Equations 
(17)-(21) are the formulation which will be used, with conditions (15), hence¬ 
forth. 

We wish now to smooth P with respect to variations in e. The number- 
theoretic requirement (17) is extremely difficult to work with. For reasons of 
simplicity, then, we shall assume that e is the only parameter which changes as 

* Note that, even though the periods appear explicitly only in (19) hereafter, all the 
following equations are true only for Tt — 1. (This is evident if we recall that w has the 
dimensions of inverse time.) Thus we are definitely assuming that T\ — constant. 



PROBABILITY OF COINCIDENCE 


25 


T% is varied. The errors which may arise from this assumption are treated at 
the end of section 5. 

From (19)—or from the absolute value signs in (17), (18)—it will be seen that 
all possible situations arise if e varies merely from zero to one-half. In order 
that this should entail as little variation in T 2 as possible, our conventions should 
be chosen as already stated in (15). Even under these circumstances, a maxi¬ 
mum variation of 33% in T 2 may be required to cover the range e = 0 to £. 

Equation (21) cannot be used directly without the interpretational convention 
there noted. This leads to difficulties of treatment which the author was unable 
to solve. The difficulties may be avoided by the following device, which ad¬ 
mittedly has less direct significance than an averaged value for P. 

We enquire after the fraction / of the range of e over which P has a value (at 
fixed t) less than some given value Q + P 0 . We may then say that, if a large 
number of trials each of length t is made, then in / of them, the probability of 
coincidence will be less than Q + P 0 . 

5. Calculation off. The exceptional behavior of P is that caused by interlock 
possibilities. This corresponds to p x = 0 in (17). Thus the exceptional values 
of P center about the points e = i/k, where i and k are relatively prime (other¬ 
wise, k would not be the smallest integer satisfying (17)). Moreover, by a 
standard theorem [1], k < 1/w. Thus the critical points form the Farey series 
of order 1/w in the range (0, £). About each Farey point, we may suspect that 
there will be an interval over which k is constant, and that the entire range may 
thereby be divided up into ranges of constant k. 

In thinking about the use of (17) in a typical calculation, it is convenient to 
eliminate the integer i by representing multiples of e as a series of points pro¬ 
gressing around and around a circle of unit circumference. When e = i/k , the 
kth multiple will (after i revolutions) coincide with the origin; this and the 
earlier points, it is easily shown, will be distributed uniformly about the circle 
with a separation 1/k. 

As e moves away from the Farey point, k will, by definition (17), remain con¬ 
stant until either (a) the point ke moves a distance greater than w from the 
origin or (b) an earlier point moves to a distance less than w from the origin 
(Fig. 6). 

Let (me) be that earlier point nearest (initially 1/k from) the origin and moving 
toward it as e varies in a particular direction. Of course, 

(22) m < k. 

For each Farey point, there will be two values of m; one for decreasing e and 
one for increasing e. If we introduce the new variable: h — the absolute value 
of the change in e from the Farey point i/k, then each point, ne, on the reference 
circle will move a distance nh , and (17) gives as the conditions for constant k 
(Fig. 7): 


( 23 ) 


(a) w > kh =s piy 

(b) mh < (1/k) — w. 



PAUL I. RICHARDS 


Thus we have divided the range (0, i) into small ranges where k (and rri) are 
fixed. The number of small ranges is roughly twice the number of Farey points 
in (0, i). 

Within each small range pi , K, p* still vary with e. The behavior of p\ is 




--e = 0.375* 
-e=o.38z 
w=o.os*- 


already given in (23a); we shall find that we do not need p*. Using (18) and 
Fig. 7, it may easily be shown that: 

(24) K = m + jk + k, 
where 

(25) j + a = (1 — mkh — kw)/k\ j = [j], 0 < a < 1. 

From (23a), (24), (25), we obtain: 

(26) (K - k)p x = 1 - kw - ak*h (0 < a < 1). 

Having thus divided the range of e into small regions within each of which the 
number-theoretic requirements (17, 18) take a relatively simple form, we must 
now turn to the calculation of / = that fraction of the range e = (0, i) over which 
P < Po + Q at fixed t. We shall specialize the further analysis to the case 
Q < J. This considerably shortens the discussion and yields essentially all the 
useful results of the more general inquiry. 

We first note from (21) that, since p% < pi < w (i.e. because of (4)), we have 
P < Po + Q independently of e if t < Q/w 



PROBABILITY OF COINCIDENCE 


27 


(27) / = 1, for t < Q/w. 

Similar reasoning shows on the other hand that, when t > Q/w , those regions 
with k > Q/w do not contribute to /. In the following, we shall there¬ 
fore employ: 

(28) k < Q/w <t, Q <$. 

Equation (28) implies that we must use either (21b) or (21c); we shall next 
show that we do not need (21c). The value of P whenever (21c) is applicable is 
certainly greater than (P 0 + kw + (K — k)p x ) . From (26), this value is equal to 
(Po + 1 — ak*h). Now from (28), w < 1 /2k, whence by (23a) h < l/2k 2 < 1/2 ak 1 
(since a < 1). Thus (P 0 + 1 — ak 2 h) > P 0 + £ > Po + Q, and consequently 
(21c) never applies until P > P 0 + Q. (This means merely that the double 
overlap discussed in section 3 cannot occur until at least half the torus is covered.) 
Accordingly, we can confine our attention entirely to (21b) in any further dis¬ 
cussion of /. 



Substituting for p x from (23) and recalling that (t — k) is positive (by (28)), 
we find from (21b) that the condition P < P 0 + Q becomes: 


(29) 


h < 


Q — kw 
k(t — k) ' 


However, h is subject also to the restrictions (23), which insure that we do not 
stray from the small region where k is constant. We assert that (29) implies 
(23) and may therefore be used as the final expression of the requirement 
P <Po+Q. 

To prove this, note first that (29) and (28) immediately give h < w/k, which 
is (23a). Secondly, (28) implies l/k > 2w so that, using (23a) and (22): 
(1/k) — w > w > kh > nih , which is (23b). 

Thus we arrive at the result that / receives contributions only from those 
elementary regions where k satisfies (28) and that the contribution of each such 
region is governed by (29). 

Since the variable h was defined as the absolute value of the change of e from 
the Farey point i/k, each Farey point (satisfying (28)) contributes an amount 
equal to twice 8 the right-hand aide of (29). Since this amount is independent 


1 This is not true of the Farey points 0 and £, the ends of the range of e, but the terms 
k — 1,2 in (31) correctly account for these contributions since $(1) -» ^(2) -■ 1. 



28 


PAUL I. RICHARDS 


of t, we may immediately sum over all Farey points i/k with fixed k. There 
are fo(k) such points’ in the range (0, £), where Euler’s function <f> is defined by: 

(30) <t>{k) = the number of integers < k and relatively prime to k. 

(Note that <£(/c) is even for k > 3 since if k and i have no common divisor > 1, 
neither do k and k — i.) 

Thus, summing over all these contributions and dividing by the length of the 
total range: 

(31) / = 2 ]£ <t>(k) , for < > Q/w. 

i £k<Q/v> k\t k) 

Regarding error in (31) due to the inaccuracy of (21), note that this can enter 
only when we set P = P 0 + Q in deriving (29). Actually the difference between 
(21b) and the correct value of P will change as e is changed so that there is con¬ 
siderable possibility that these effects will cancel out in (31). (In fact, a de¬ 
tailed study shows that the error in (21b) assumes opposite signs as e varies in 
opposite directions from any given Farey point.) In any case, because (31) is 
monotone in Q, the error in (31) can be no greater than that found by substi¬ 
tuting Q =fc w 2 TiT 2 for Q. Taking account also of the variation of P 0 with T 2 , 
the same argument establishes the “Q-dependence” of (6) given in section 2. 

Finally, we investigate the error due to change in w with 1\. If w is the maxi¬ 
mum value of w , Farey points with k < Q/W are certain to contribute to /, and 
this contribution will be at least as great as (Q — kW)/k{t — k ) so that / > f{W). 
On the other hand, if w is the minimum value of it, Farey points with k > Q/w 
cannot possibly contribute to /, and the remaining points can contribute no* 
more than (Q — kw)/k{t — k) so that / < Hence we arrive at the final 

statement (6) in section 2. 

6. Approximations for/. Computational difficulties in the use of (31) sug¬ 
gested approximating it by a more readily computed expression. By a standard 
theorem [1, p. 266]: 

(32) *(fc) « Qk/w 2 . 

We may then approximate (31) by: 

If Q/w is large compared to i (recall t > Q/w), this becomes very nearly: 

(33) / - 1.216 Q (l + - l) log (l - for t > Q/w. 

Despite the cavalier derivation of (33), its agreement with (31) is remarkably 



PROBABILITY OP COINCIDENCE 


29 


close. Fig. 2 shows a perfectly general comparison of (31) and (33), where the 
agreement will be seen to be fairly good even for t and Q/w of the order of 4 or 5. 
Note also that (33) nearly always gives a value of / that is too large. 

For completeness, we may repeat (27). 

(34) / = 1 for t < Q/w. 

Note that only the dimensionless quantities tw> Q enter into (33, 34) which are 
therefore independent of the normalization (15). 

REFERENCE 

U] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers , Clarendon 
Press, Oxford, 1938, p. 30. 



NONPARAMETRIC ESTIMATION, HI. STATISTICALLY EQUIVALENT 
BLOCKS AND MULTIVARIATE TOLERANCE 
REGIONS—THE DISCONTINUOUS CASE 

By John W. Tukey 
Princeton University 

1. Summary. In Paper II of this series [2, 1947] it was shown that if n 
functions and a sample of n were used to divide the population space into n + 1 
blocks in a particular way, and if the joint cumulative of the functions were contin¬ 
uous , then the n + 1 fractions of the population, corresponding to the n + 1 
blocks, were distributed symmetrically and simply. 

In Paper I of this series [1,1945] it was shown that the one-dimensional theory 
of tolerance regions could be extended to the discontinuous case, if equalities were 
replaced by inequalities. 

In this paper the results of Paper II will be extended to the discontinuous case 
with the same weakening of the conclusion. The devices involved are more com¬ 
plex, but the nature of the results is the same (See Section 5). 

As a tool, it is shown that any n-variate distribution can be represented in 
terms of an n-variate distribution with a continuous joint cumulative (in fact, 
with uniform univariate marginals), where each variate of the given distribution 
is a different monotone function of the corresponding variate from the continuous 
distribution. 

2. Introduction. The importance of extending the simple results of the 
continuous case to the more complex results of the discontinuous case may not 
be clear at first thought. Yet all the data with which the statistician actually 
works comes from discontinuous distributions . Often these distributions are very 
fine-grained—the distributions of the number of eggs laid by codfish and of the 
measured wavelengths of a spectral line (measured in 0.000001 A) do not have 
large concentrated probabilities, but all their probability is concentrated at dis¬ 
crete points. Insofar as the considerations of the theoretical statistician apply 
to the data as received rather than to the “data” of a more or less imaginary 
model, these considerations apply to data with a discrete distribution. When 
his theories are erected orr a basis of a probability density function, or even a 
continuous cumulative, there is a definite extrapolation from theory to practice. 
It is, ultimately, a responsibility of the mathematical statistician to study dis¬ 
crete models and find out the dangerous large effects and the pleasant small 
effects which go with such extrapolation. We all deal with discrete data , and must 
sooner or later face this fact . 

In order to deal with the discontinuous case, we must face two problems: (we 
assume that the reader is familiar with Paper II [2]) 

(1) What to do about “ties”? 


30 



NON-PARAMETRIC ESTIMATION III 


31 


(2) Finite probabilities associated with cuts. 

The first of these is peculiar to the multivariate situation and can be easily ex¬ 
plained by an example. Consider the three points in the plane with coordinates 
(1, 9), (3, 9) and (2, 6). Let the first two functions be y and x , then the pro¬ 
cedure of Section 4 of Paper II [2] is not unique—two possibilities arise: 

Alternative A. (1,9) is selected as having the largest y, and (3,9) as having the 
largest x among the remaining (two) points, hence Si = { (x, y)\y > 9}, S 7 = 
V < 9,x > 3}, S 2 |4 = {{x,y)\y < 9,3 < 3}. 

Alternative B . (3,9) is selected as having the largest x , and (2, 6) as having the 
largest x among the remaining (two) points, hence & = {(x, y)\y > 9}, S'* = 
{(x,y)\y < 9,jc > 2), S't\A = \{x,y)\y < 9, x < 2). 

Notice that S% . The procedure is not unique. In the continuous case, 
ties happen with probability zero, hence their consequences could be neglected. 
This is now no longer the case. 

This difficulty is solved by using more functions and the idea of lexicographical 
(like a dictionary!) ordering. In the simplest case, we add no new functions and 
proceed as follows: If there is a unique i for which is maximal, select it. 

Otherwise look among the Wi for which <pi(w,) is maximal—look at the values of 
<P 2 (wi). If there is a unique such i for which <pi(wi) is maximal, select it. If not, 
go on to <pz(wi) • • • . This procedure leads to a specific i unless <ph(wj) — <Ph(wk) 
for h and some j ^ k. But in this case it does not matter whether j or k is 
selected, the set of m-tuples ^(w,-), • • • , remaining will be the 

same, although the indices i will not. But the indices play no role in the actual 
construction. 

As an example, consider the following 20 four-letter words as a sample and let 
there be four functions—being the negative of the position in the alphabet of 
the i-th letter of the word. (Thus a > b > c > • • • > z.) 

Sample: meet, west, made, gone, come, back, said, that, maid, well, with, with, 
just, week, very, near, edge, this, last, have. (The Law of the Three Just Men , 
Edgar Wallace, pp. 159-160). 

Selections: back , made, near, (gone, come, edge, have. The fourth selection to 
be made at random among these four.) The inferences which can be made about 
the four-letter words in Edgar Wallace^ writing vocabulary are left to the reader. 

We have just given one rule for breaking ties, one which chooses Alternative B 
in our example. But we might prefer a rule which chooses Alternative A. To 
get more generality, we have only to take M functions, M > m, and let <p P (d , 
fo>< 2 >, * * * j , (where we may suppose p(l) = 1 without loss of generality) 
play the role just taken by *>i, ^, • • •, #>« . Thus if the maximum of is not 
unique proceed to <^2 (ip), thence to <p*(w), • • • , thence to For the second 

block, start with <p P m , then <p P { 2 )+i , <pp( 2)+2 ,•••,<?«>. And so on. The choice 


<Pi(x> y) = yj 
y) ~ 



32 


JOHN W. TUKEY 


<Pi(x, y) - xe*, 

<Pi(x, y) = X, 

V) = y\ 

with p( 1) = 1 and p(2) = 4, leads to Alternative A above. (Note that is a 
dummy in the sense that it is never used.) The problem of ties, which was a 
problem in uniqueness of construction, is thus dealt with. 

Next we must deal with the cuts. When we made Si, & and S 2 |4 in Alternative 
A, we omitted some points, namely 

Ti = {(s, y) | y - 9}, and T 2 = { (x 9 y) | y < 9, x = 2}. 

In the continuous case this did not matter, since these sets had probability zero 
and could be avoided. Here they cannot, and we shall have to consider a family 
of blocks (in the wide sense) as consisting of the blocks S and the cuts T. The 
solution of the univariate case in Paper I [1] shows us that what we must expect 
is that: 

Pr { coverage Si + TV- 1 + Ti > t) > Pr { coverage of one 

continuous-case block > t } > Pr { coverage Si > 

That is, if we want a certain set of blocks to cover (together) at least a certain 
amount with a certain probability we must add the adjoining cuts; and if we 
want a certain set of blocks to cover at most a certain amount with a certain prob¬ 
ability we may add only these cuts which do not adjoin blocks not in our set. 
By introducing the cuts explicitly, we solve the second problem. 

In order to reduce the size of the cuts, our detailed definitions will differ in 
detail from those which we have used so far. In the example, where the functions 
leading to Alternative A are used; we place in & not only the points with y > 9, 
but also those with y = 9 and —x > — 1; we place in S% not only the points with 
y < 9 and x > 3 and the points with y < 9, x = 3, y* > 49, but also those with 
y = 9 and — x < — 1. Proceeding in this way, we reduce T\ to the point x = 1, 
y = 9 and T% to the point x = 3, y = 9. This reduction can only diminish the 
probability associated with the cuts, but we cannot be sure that it will reduce it 
to zero. 

Only in the quasi-trivial case, where the probability that all functions shall tie 
together is zero, do we return to the simplicity of the continuous case. This case 
is quasi-trivial because it does not arise with discrete probabilities, and real ob¬ 
servations always involve discrete probabilities. 

Having discussed the results, we should now briefly touch on the methods. 
The proof of the main theorems depends on two facts: 

(1) a representation theorem, (5.3), and 

(2) a lemma, (6.1) which shows that m functions would be enough if (i) the 
distribution were fixed, and (ii) cases of probability zero were neglected. The 



NON-PARAMETRIC ESTIMATION III 


representation theorem has been outlined in the summary. It is analogous 
to, but a definite extension of the one used in Paper I [1]. It seems to be new in 
statement, though not in thought—it will surprise few probability theorists. The 
novel element is the monotonicity of the functions, which is utterly essential for 
our purposes. 

The lemma allows us to reduce the general case to the case of no extra func¬ 
tions, where the reduction must be made differently for each underlying distri¬ 
bution. The reduced functions are then represented by the representation 
theorem and the results of Paper II [2] are taken over. The results are stated 
in a form independent of the underlying distribution and the particular repre¬ 
sentation, hence they apply in general. 

The last paragraph stresses the principle common to Paper I [1] and this paper. 
It is natural to call it the “iceberg principle,” and to sketch it as follows: “We 
have some information about the visible one-ninth of the iceberg, and we want 
to conclude something about this visible part. If we can imagine another eight- 
ninths, consistent with the part we know, and if using that we can prove some¬ 
thing expressed solely in terms of the visible part, then this is the required proof. 
(The only essential is to be able to match every visible part.)” Both the reduced 
functions (which depend on the underlying distribution) and the uniform vari¬ 
ables used to represent them are part of the invisible eight-ninths which “could 
be there.” 

3. Terminology and Notation. In general we use the terminology and nota¬ 
tion of Paper II [2], and we shall continue to assume that all functions concerned 
in the argument are measurable. 

Given two finite sequences of the same length, we write (oi, a*, • • • , a m ) > 
( 6 i, 62 , • • • , 6 m ) if any of the following hold: 

ai > h\ , 

01 = 61 , and 02 > 62 , 

Oi = 61,02 — 62 , and 03 > 63 , 

• # • 

Oi = for t ^ nij and flm ^ h m • 

This is the lexicographical order referred to above. (We interpret (o x , 02 , • • •, 
o w ) < ( 61 , 62 , • • • , 6 «) to mean ( 61 , 62 , • • • , 6 m ) > (oi, a %, • • • , O and * to 
mean identity.) 

3.1 Definition: Given a sequence of real-valued functions ,<pt , • • • ,<pm and a 
sequence of starting indices p(l), p(2), • • • , p(m), (which we shall often refer to, 
briefly, as an m-system of functions, <pi , <p %, • • • , <pu, without explicitly mention¬ 
ing the starting indices), the functions # 1 , , • • • , are defined as follows: 

(3.2) $k(w) = {<Pp(k)(u>),<pp{k) 4i(w)> ••• ,q>i f(w)}, 



34 


JOHN W. TUKEY 


the values of being sequences of M — p(k) + 1 numbers. (In these terms, the rule 
for tie-breaking already explained becomes 4 'select an i for which $*(t d) is max¬ 
imal (in the sense of lexicographical ordering)”.) 

4. The blocks and cuts determined by n points. 4. Definition: Given 
an m-system of functions <pi, <& , • • • , <pm and n points w x ,w 2 , • • • , tfl n , (m < n) 
the corresponding blocks and cuts are given by the following procedure: (the $’s are 
defined in 3.1) First z(l) is selected to maximize $i(w<)> when 

Si = {u>|$i(w) > <f>i(u>,(i))), 

Ti « {w|<i>i(w) = ). 

Next, i( 2) is selected t‘(l) and to maximize among such i, when 
s 2 = {w |4>i(u>) < $l(w i( x)),& 2 (w) > ^(u\-(J))}, 

T 2 = {w|<h(ta) < $i(w«i))> ^(w) = ^(w.w)}. 

.(the construction is perfectly analo¬ 
gous to II-4.1) 

Sm|n4i = {w I $k(w) < $*(«%■(*))> k = 1, 2, • • • , m ). 

4.2 Definition: If m = n, then S n \n+i is also denoted by S n+X . 

If m > n, then only $1 ,$*,••• are used and S n \n+i is also denoted by S n +i- 

We denote by X a subset (possibly none, possibly all) of the indices 1, 2, • • • , 
m and m\n + 1 or, in case m > n of the indices 1, 2, • • • , n + 1. 

4.3 Definition: The block-group B\ consists of the union of all Si with i in X 
and all Ti with both i and i + 1 in\(m + 1 means m\n + 1). 

The closed block-group B\ consists of the union of all Si with i in X and all T x 
with either i or i + 1 in X. 

Given any set we define its coverage as the proportion of the population falling 
into it (here the underlying probability distribution appears for the first time in 
this section), and we use 

4.4 Definition: The coverage of B\ is denoted by C(X) and that of B\ by C{\). 

Thus, given a family of functions <p and n points w, the space of the w is divided 

into blocks and cuts, these are joined together into block-groups, and these 
block-groups have coverages. Thus, if the family of functions is fixed, the n 
points determine these coverages, and, if the points are chance points, the cover¬ 
ages are chance numbers. 

6. Statement of results. Having discussed the construction, we can now 
state the results. 

(5.1) Theorem A m | n +i. Let <p \, y*, • • • , <pu be any m -system of functions and 
let Wi, Wi , • • • , W ny where m < n, be a sample from any distribution , let the 
blocks , cuts , block-groups and coverages be formed , as described above , using the 




NON-PARAMETRIC ESTIMATION III 


35 


same (unknown) distribution for forming the coverages . Then , if a i, aj, • • • , a p 
are any set of Vs (each \ is a set of indices\) 9 

Pr {C(ai) < ai, C(a 2 ) < a*, • • • , C(ctk) > a* , • * •, C(a p ) > a p } 

> Pr [t(aj < ai, f(a 2 ) < a*, • • • , *(<**) > a*, • • • , «(a p ) > a p }, 

where t(\) = 2fc/or i in X, ^ |n+ i = + • • • + t n +i , and t x , , ... , * n+1 Xave 

a uniform distribution on (he barycentric simplex . (Compare Theorem A mjn+l of 
Paper II [2].) 

In particular , 

Pr (C(i) < a} > J a (l, n) > Pr {C(t) < a), t *■ 1 , 2 , • • • , m, 

n?/iere J a (l, n) is the incomplete Beta-function . 

( 5 . 2 ) Theorem B n+ i. Let <p x ,<&,•••, <pu be any n-system of functions and 
letWitWi, ••• ,W n be a sample from any distribution. TTien 

Pr {C(aO < ai, C(a 2 ) < Oa , • • • , C(a*) > a* , • • • , C(a p ) > a p ] 

> Pr {t(a{) < ai, J(a 2 ) < a* , • • • , t(a%) >«*,•••, J(a p ) > a p }, 

where t(\) = 2k /ori in X and h, t 2 , • • • , t n+i have a uniform distribution on the 
barycentric simplex. In particular , 

Pr {C(i) < a} > 7«(1, n) > Pr {C(i) < a}, t — 1 , 2 , • • • , n + 1. 

For convenience of reference, we also state the representation theorem as: 

(5.3) Theorem C. Let X x , X 2 , • • • ,X n have any joint n-variate distribution . 
Then there exist (real) functions g x , gr* , • • • , g n and a joint distribution for 
Ui , 17a, • • • , U n such that , 

(i) the marginal distribution of each Ui is uniform on [ 0 , 1 ], 

(ii) each function g is non-decreasing , 

(iii) the distribution of g x (Ui), g*(Ud, ••• , g n (U n ) is identical with that of 
X\ 1 Xj» ” • j X n • 

6 . The functions The aim of this section is to prove 
(6.1) Lemma. Given any m -system of functions vi <pu , there exist 

real functions fa , fa , • • • , such that , if IFi, W 2 , • • • , W n are a sample from 

the distribution concerned: 

(6.2) Pr {fa(Wf) = h(Wk), but fa+ h (Wj) 5 * fa+ h (W k ) for some h > 0} - 0. 

(6.3) Pr [$i(Wj) has a different relcdion to$i(Wk) than that of fa(Wj) tofa(Wk)} = 0 , 
where by relation is meant >, =, or <. 

The fa will depend on the underlying probability distribution. Thus they are 
useful in the proof, but could not replace the $*• in the statement of the theorems. 

(6.4) Lemma. Let <f>(it>) have its values in a totally ordered set , (i.e. always either 

$1 < , $1 = <£2 or $! > $ 2 ) and let W have a distribution. Consider the function 

*, 


faw) — Pr [$(W) < $(w)}. 



36 


JOHN W. TVKEY 


Let Wi , W% , • • • , W n be a sample from the same distribution , then , with probability 
one t the relation (<, =, or >) between <f>(TF,) and $(JF*) is the same as that be¬ 
tween \p(Wj) and iKW*). 

If #(t Oj) < $(«?*), then \p(wj) < if yp(w 3 ) < \fr(wk), then $(wj) < $(wk). 

These follow directly from the definition. To prove the lemma, then, we must 
show that 

(i) f(wj) = yp(wk) but $(iVj) < $(«;*) occurs with probability zero. 

We may clearly assume that the totally ordered set is complete, and that, in 
particular, it contains the symbols — °o and + «. Consider the real function of 
an abstract variable, 

F(s) - Pr {$(TF) < 5 }. 

It is a monotone function, with F( — ») = 0 and F(+ oo) *= L We can there¬ 
fore, given t > 0, select elements — oo = « 0 < $i < < • • • <»* = + 00 such 

that 

0 < F(s{+ 1) — F($i + 0) < c. 

If (i) occurs, then and <&(w k ) belong either to the same open interval 

(*i, «•+ 1 ) or one belongs to an open interval and the other is its upper endpoint. 
The probability of either of these happening is at most 

W(W ~ 1} \F(s i+1 ) - F(«, + 0) } 2 + «{ F(s M ) - F( S< + 0)} { F(s m + 0) - F(s m )). 

Summing this over all intervals yields an estimate of 

W(n ~ - Max \F(s H1 ) - F(si + 0)) = w( ” ~ 1} «. 

Since this goes to zero, the lemma is established. 

We turn now to the proof of (6.1). The system of functions <pi, <p*, • • • , <pa 
define the $ m according to Section 3. These define ih, fa , • • • , 

\p m according to lemma (6.4) just proved. Applying this m times proves (6.3). 
Recalling that $,*(w/) = $,*( 10 *) implies $,+*( 10 ,) = $,+a(w*), we see that (6.3) 
implies (6.2). 

7. The notation F(x + X-0). All practitioners of analysis are familiar with 
F{x + 0) and F(x — 0), defined by 

F(x ± 0) lim F(x =fc h ). 
hi 0 

We now generalize this formal notation to 

(7.1) F(x + X*0) - F(x + 0) + F{x - 0), 

where we will, in our immediate applications, need only X’s between —1 and +1 



NON-PABAMETBIC ESTIMATION III 


37 


(although the definition applies in general). Notice, for example, that 

F(x - 0) < F(x + X-0) < F(x + 0), for -1 < X < 1, 
that if F is continuous at x, 

F(x + X-0) = F(x ± 0) = F(x), 
that the condition for F to be normalized is 

F(x + 0-0) = F(x). 

A similar definition is made for functions of two variables, namely 
F(x + X-0, y + ii-0) « L±*F(x + X-0,y+0) + F(x X-0, y - 0) 

“ —F(x + 0, y + #i-0) +Lz^F(x- 0 , 2 , + „- 0 ),’ 

where the two right-hand sides are equal if, as is the case for cumulatives, all 
doubly one-sided limits exist. 

If F(xi, x 2 ) is the joint cumulative of two variates, then, when all ordinates 
and abscissas involved are ordinates and abscissas of continuity, 

Pr \a < x < b y c < y < d] = F(b , d) — F(6, c) — F(a, d) + F(a, c) > 0. 

Passing to the limit in assorted ways, and taking linear combinations gives 

F(b + M*0, d + P-0) - F(b + m- 0, c + v-Q) 

(7.2) 

— F(a -f- \*0, d + p*0) 4“ F{ol -f* X*0, b 4" u m 0) 0, 

for — oo < a, b, c, d < 4- 00 and — 1 < X, m, p, p <1. This will be of use 

shortly. 

8. The representation theorem. It was shown in Paper I [1] of this series, 
that the uniform distribution on [0,1] could serve as the prototype of any variate 
—that is, that given a distribution, there is a monotone function g, so that g(U) 
has the given distribution, where U has the uniform distribution on [0, 1J. 
(In Paper I, U was denoted by X*). 

In the notation of the last section, there is a function X ( u ), with | X (w) | < 1, 
so that 

(8.1) F(g(u) + X(w)-0) = ti, 

for all u. (We may, and shall, require that g(u) = — oo, for u < 0, and g(u) 

— 4- oo for ti > 1). It is easy to see that g(u) is unique except on a set of 

probability zero and that X(ti) is unique (and in fact linear) on each open interval 
which contains no value of F(x). 

Each cumulative F(x), then serves to define g(u) and X(ti) by the equation 



38 


JOHN W. TUKET 


( 8 . 1 ). Two or more independent variates can be thrown back on a set of inde¬ 
pendent uniform variates by applying this process to their cumulatives separately. 

Our present problem is to prove Theorem C ( 5 . 3 ), which applies to variates 
Xi , Xu , • • • , X n which need not be independent. Let Fi(xi) be the (marginal 
cumulative of Xi, and use ( 8 . 1 ) to define gx{u x ) and A*(it,). Then define the 
joint distribution of Ui, U 2 , • • • , U n by 

Gfal, U n )' = F(gi(ui) + Ai(ui)- 0 , •• • , g n {un) + A n (O- 0 ), 

where F(x 1, 22, • • • , x n ) is the joint ciunulative of the Xi , X 2 , • • • , X n . 

We shall verify that this is the desired distribution in the case n = 2 , leaving 
the general case to the reader. Consider G{u \, + 00) = G(u \, 1) = F(gi(u\) 
+ A, (wi)*0, + 00). This is a cumulative, and so is G(+ <*>, th). In fact, 
using (8.1) they are each the uniform cumulative 

fo, u < 0, 

G(u) ■* <w, 0 < u < 1, 

U, 1 < 

By ( 7 . 2 ) all second differences are positive, and hence G(ui , uj) is a joint cumu¬ 
lative. Since its marginals are uniform, it is continuous. 

Finally, 

PrigiiUi) < s 1, g 2 (U 2 ) < «} = G(F($i - 0, + 00), F(+ 00, 82 - 0)) 

= F(si - 0, s 2 - 0), 

since giM < Si is equivalent to Ui < F(si — 0, + x>) and g 2 {u^) < s 2 is equiva¬ 
lent to Ui < F(+ 00, s 2 — 0). Thus g\(Ui) and g 2 {U 2 ), have the given bivariate 
distribution. 

9 . Proof of main theorems. We come now to the proof of Theorems Alj n+ i 
and Bn41, and we begin with A^n-ti. According to Lemma ( 6 . 1 ), the various 
indices, i(l), t(2), ..., t'(m) selected to determine the blocks will be the same, 
excluding cases of probability zero, whether the or the ft are used. Consider 
the first block, which takes the forms: 

Si = {W | $i(IF) >^i(w*(d)}. 

= \w \MW) >*.(«>«»))• 

Another application of Lemma ( 6 . 1 ) shows that these sets differ by a set of 
probability zero, and hence their coverages are identical. It will thus suffice to 
prove theorem A m | n -u for a fixed underlying distribution and the corresponding 
ft, ft , • • * , ft. • 

According to Theorem C ( 5 . 3 ), the m-variate distribution of the ft(TP) can be 
represented in terms of uniformly distributed variates Ui , • • •, U m and monotone 
functions gi(Ui), • • • , g m (U m ). Now U \, !/*,••• ,U m have a continuous joint 



NON-PARAMETRIC ESTIMATION in 


39 


cumulative, so that theorem A m ( n+ i applies to a sample of n drawn from this m- 
variate population, with the coordinates themselves as the m functions. We shall 
denote the coordinates of the i-th element of this sample by ui(i)» * • • , Um (t). 
Consider the first block, 

Si - — 

Its image, g(S) = {(gri(t/i) ,•• • >gm(JJ m )) | lh > ui(t(l))} 
contains 

St - {(0i(W, • • • , feOTJ) | g(CT,) > ffWi(l)))i, 
and is contained in the union of S* and T*, where 

Ti - {(^i(«7i), ••• ,g.(W) IgOW = g(m(i(l)))|. 

Thus the conclusions of Theorem A*| n+1 hold for S* , T* , • • • , Sm , TZ , Smin-u. 

Now while Theorem A*j nfl mentions the underlying W’a implicitly, careful 
study shows that they are not really involved; only the joint distribution of the 
<Pi , which in our present case are the , matters. Since this is the same for the 
(W) and the p*(I/,), Theorem A*| n _n must hold for the ypi and the theorem 
is proved. 

Theorem B n+ i is again a special case of Theorem A m | n -fi. 

REFERENCES 

[l] H. Scheff£ and J. W. Tukey, “Nonparametric Estimation I. Validation of order 
statistics/’ Annals of Math. Stat ., Vol. 16 (1945), pp. 187—192 (Also cited as 
Paper I). 

(21 J. W. Tukey, “Nonparametric Estimation II. Statistically equivalent blocks and mul¬ 
tivariate tolerance regions. The continuous case,” Annals of Math. Stat., Vol. 
18 (1947), pp. 529-539 (Also cited as Paper II). 



ASYMPTOTIC PROPERTIES OF THE MAXIMUM LIKELIHOOD 
ESTIMATE OF AN UNKNOWN PARAMETER OF A DISCRETE 
STOCHASTIC PROCESS 


By Abraham Wald 
Columbia University 

Summary. Asymptotic properties of maximum likelihood estimates have 
been studied so far mainly in the case of independent observations. In this 
paper the case of stochastically dependent observations is considered. It is 
shown that under certain restrictions on the joint probability distribution of the 
observations the maximum likelihood equation has at least one root which is a 
consistent estimate of the parameter 0 to be estimated. Furthermore, any root 
of the maximum likelihood equation which is a consistent estimate of 8 is shown 
to be asymptotically efficient . Since the maximum likelihood estimate is always 
a root of the maximum likelihood equation, consistency of the maximum likeli¬ 
hood estimate implies its asymptotic efficiency. 

1 . Introduction. Let {X<}, (i = 1,2,---, ad. inf.), be a sequence of chance 
variables. It is assumed that for any positive integral value n the first n chance 
variables X \, • • • , X n admit a joint probability density function p„(xi, • • • , 
x n , 8) involving an unknown parameter 8. The consistency relations 

r +*° 

(1.1) / Pn+lfo , * * * , Zn+1 , 0) dXn+1 = ?„(*! 

are assumed to hold. 

In what follows, for any chance variable u the symbol E(u | 8) will denote the 
expected value of u when 8 is the true parameter value. 

Let tn(x i, • • • , x n ) be an unbiassed estimate of 8 . Cramdr [1] and Rao [2] 
have shown that under some weak regularity conditions oh the distribution 
function p n (xi , • • • , x n , 0), the variance of tn cannot fall short of the value 


1 1 



Thus, for any unbiassed estimate tn the variate Vc n (0)(*» — 8 ) has mean value 
aero and variance ^ 1. An estimate tn is called efficient if Vcn(0)(fc» — 0) has 
mean value zero and variance 1. 

A sequence {k}, (n = 1, 2, • • • , a d. inf.), of estimates is said to be asymptot¬ 
ically efficient if the mean of y/cjfi) (tn — 0) is zero and the variance of y/c n (8) 
(tn — 0) is 1 in the limit as w —> oo . In the literature usua lly the additional re¬ 
quirement is made that the limiting distribution of Vc n (0) (t% — 0) be normal. 

40 



ASYMPTOTIC PROPERTIES 


41 


To make a distinction bet ween the two cases when the condition concerning the 
limiting distribution of Vc n (0) (k — 6) is fulfilled or not, we shall say that {<»} 
is asymptotically efficient in the wi de sen se if it satisfies the conditions concern¬ 
ing the mean and the v ariance of y/cjfi) (k — 0). If, in addition, the limiting 
distribution of \/c„(0) (k — 0) is normal, we shall say that {tn} is asymptot¬ 
ically efficient in ther strict sense. Clearly, if {/„) is asymptotically efficient in 
the strict sense, it is also asymptotically efficient in the wide sense. 

A word of clarification is neede d as to the meaning of the conditions concern¬ 
ing the mean and variance of \Zc n (6) (t n — 0). One interpretation would be 
that the requirement is that 


(1.3) 

lim £[Vc„(0) (t, — •) | 0] = 0 

n«»co 

and 


(1.4) 

lim £[c n (0) (t n - ef | e] = 1 . 


Another int erpre tation would be that the requirement is that the limiting dis¬ 
tribution of yjc n (0) (t n — 0), provided that the limit distribution exists as n —* oo, 
should have zero mean and unit variance. These two interpretations are cer¬ 
tainly not equivalent. It seems to the author that the mean and variance of 
the limiting distribution is more relevant than the limits of the mean and the 
variance. We shall, therefore, adopt the following definition of asymptotic 
efficiency: 

Definition: A sequence jk) of estimates is said to be asymptotically efficient 
in the wide sense if a sequence {un} , (n = 1,2, • • • , ad. inf.), of chance variables 
exists such that 

(1.5) lim E(u n | 0) = 0, lim E(u n | 0) = 1 

iiwoo n—co 

and 

(1.6) VcMiL - 0) - Un 

converges sto chast ically to zero as n «. If, in addition, the limiting dis¬ 
tribution of y/cn(B) (t n — 0) exists and is normal, {k} is said to be asymptotically 
efficient in the strict sense. 

The reason that a sequence { u n ) of chance vari ables is considered in the above 
definition, instead of the limiting di stribu tion of y/cjfi) (tn — 0), is that the exist¬ 
ence of a limiting distr ibution of Vc n (0) (k — 0) is not postulated. If a limiting 
distribution of Vc n (0) (k — 0) exists and if this limiting distribution has zero 
mean and unit variance, a sequence {u w ) of chance variables satisfying the con¬ 
ditions (1.5) and (1.6) always exists. This can be seen as follows: Let T n denote 
the chance variable y/cjfi) (k - 0) and let F n (t) = prob. {T n <t\. If a limit- 



42 


ABRAHAM WALD 


ing distribution of T n exists and if this limiting distribution has zero mean and 
unit variance, then 

(1.7) lim I lim f t — 0 and lim | lim f t 2 dF n (t) 1 — 1. 

Ommto [_ *—» *-« J a—CO L n—00 J—a J 

From (1.7) it follows that there exists a sequence {a n }, (n = 1, 2, • • • , ad. inf.), 
of positive values such that the following conditions are fulfilled: 

(1.8) lim f tdF„(t ) = 0; lim f t 2 dF n (t) = 1; lim Prob {| T n \ > a„\ = 0. 

n—oo J—a n n—oo J-a n n—oo 

Let u n be a chance variable which is equal to T n whenever \ T n \ :§ a n , and equal 
to zero otherwise. Clearly, the sequence {w n | will satisfy conditions (1.5) and 
( 1 . 6 ). 

In the following section we shall formulate some assumptions concerning the 
probability density function p n (xi , • • • , x n , 0). It will then be shown in sec¬ 
tion 3 that there exists a root of the maximum likelihood equation 

aw - o 

which is asymptotically efficient at least in the wide sense. 


2. Assumptions concerning the probability density pnfo, ••• , X* , 0). We 
shall assume that there exists a finite non-degenerate interval A on the 0-axis 
such that the following conditions hold: 

Condition 1. The derivatives -—p, (i = 1, 2,3), exist for all 0 in A and for all 

otr 

samples , • • • , x n ) except perhaps for a set of measure zero. We have fur¬ 
thermore, 

••• / l.u.b. I" dxf - dx n < co, (i = 1,2). 

00 J-oo 9 < A I Ou 


Condition 2. For any 0 in A we have lim c„(0) = «. 

n—oo 

d 2 lo 

Condition S. For any 0 in A the standard deviation of —divided by the 

O0 2 

d 2 loV 7) 

expected value of — ■ (both computed under the assumption that 0 is true) 
o0* 

converges to zero as n — > oo. 

Condition 4- There exists a positive 6 such that for any 0 in A the expression 

< 2 - a > tk E [ u .fr 1 * 108 ****• ’ ’ ’ y> 1 [»] 

is a bounded function of n where 0' is restricted to the interval | 0' — 0 [ <a 5. 
In what follows in this section, as well as in section 3, the domain of 0 will be 



ASYMPTOTIC PROPERTIES 


43 


restricted to interior points of the interval A unless a statement to the con¬ 
trary is explicitly made. 

Clearly 


(2.3) 


•(^Pl •)-£*-£* 



It follows from Condition 1 that 


(2.4) L* dXl '*' dXn ** Pndxi83 0# 


Hence, 

(2.5) *(^*^1®)"°* 

We have 



because of Condition 1. 
(2.9) 


From (2.7) and (2.8) we obtain 


Conditions 3 and 4 will generally be fulfilled when the stochastic dependence 
of xj on Xi decreases sufficiently fast with increasing value of | i — j\. For, in 
such cases, the following order relations will generally hold: The standard devia¬ 
tion of - — will, in general, be of the order y/ n, the expected value of 


l.u.b. 


I d 8 log pn 
dp* 


r ((f) 

will usually be of the order n, and will generally have a positive lower 
bound and a finite upper bound. 



44 


ABRAHAM WALD 


3. Proof that the marimutn likelihood equation has a root which is an asymp¬ 
totically efficient estimate of 6 (at least in the wide sense). Let $o denote the true 
parameter value and let B be any other value. We put 


(3.1) 


0 lOg Pn 




B 2 lOg p n 

dP 




and 


a* log p» „ 


Expanding #„(xi , • • • , x n , 6) in a Taylor expansion around 6 = we obtain 

$„(xi , - ,Xn, 0 ) — $>„(Xi ,•••,*», 6 o) + (0 - ,X K ,$a) 

(3.2) + i(0 - «o)V»(x t , • ■ • , x», O 

where 0l! is some value between 0o and 6. Dividing both sides of (3.2) by c n (0o) 
we obtain 


4 > n(Xi , • • * fXnjB) _ < l*n(^l , * y Xn y Bq) 


(3.3) 


Cn(Bo) 


Cn(B 0 ) 


+ (6 - e 0 ) * n( - Xl 6o> + \{e - flo) J - (Xl ’ -' ’ - n ’ e '~ 


Cn(d Q ) 


Cn(B 0 ) 


From Condition 3 and equation (2.9) it follows that 


(3.4) 


plim 


frnpEl ,••• ,Z ni 0 o) 
Cn(0o) 


~1 


where the operator plim stands for convergence in probability (stochastic con¬ 
vergence). 

According to equation (2.5) the expected value of $ n (xi , • • • , x n , Bo) is zero. 
Since the variance of <l>»(xi , • • • , x„ , 0o) is equal to c„(0 o ), and since 
lim c»(0) = oo, we have 

a—«o 


(3.5) plim ^ x i>---,Xn,e a ) = Q 

«-« Cn(0o) 

It follows from Condition 4 that for any 0 with | 0 — 0 O | g 6 we have 

(3.6) ^ Fo *;'(x x o(i). 

According to Markoff’s inequality the probability that a positive random 
variable will exceed X-times its expected value is not greater than Hence, 

A 

it follows from (3.6) that for any e > 0 we can find a positive value k t such that 

(3.7) Urn sup Prob | $"(xi, • • •, x», 6*) | ^ fc. j g «. 

Let p be any given positive number. The probability that the maximum 
likelihood equation 

(3.8) 


$«(£l t • m * 9 Xn , B) * 0 



ASYMPTOTIC PROPERTIES 


45 


will have a root in the interval (0o — p, Bo + p) converges to one as n —► oo. 
This follows easily from (3.3), (3.4), (3.5) and (3.7). Thus, we have shown that 
the maximum likelihood equation has a root which is a consistent estimate, 
i.e. it satisfies the relation 


<8.9) 


plim (§„ — $o) = 0. 


We shall now show that if §» is a root of the maximum likelihood equation 
(3.8) and if 9„ is a consistent estimate, then 8„ is also asymptotically efficient, 
at least in the wide sense. For this purpose we substitute 0„ for 8 in (3.3) and 
multiply both sides of the equation by y/cjfio). We then obtain 


<3.10) 


where 

<3.11) 

Let 

<3.12) 


0 = 


4 ) w(.Ti , * * * , , g„) 






Cn(9o) 

+ VCn(8 0 ) (On ~ 0 0 )* V, 


1 $n_ (Xl, , X n , On) 

2 Cn(0o) 


V» 


^■(^i, * * *, , 9„) 

Cn (So) 


and z„ 


Vc n (9o) (On - Oo). 


Then (3.10) given 
<3.13) -y n - 


*n(Sl > • * * ,Xn,0») 

Cn(0o) 


+ Z»(0n ~ Oo) V n . 


It follows from (3.7) and (3.9) that 


<3.14) 


plim (On ~ Oo) Vn = 0. 

flaw00 


From (3.4), (3.13) and (3.14) we obtain 
<3.15) -*-«.(- 1 + f«) 

where 

<3.16) plim = 0. 

»—oo 

Since Ey n = 0 and Ey\ = 1, it follows from (3.15) and (3.16) that 
(3.17) plim (z % — Vn) = 0. 

ftartO 

The asymptotic efficiency (in the wide sense) of B n is an immediate conse¬ 
quence of (3.17). Our main result may be summarized in the following theorem: 
Theorem. If the true value of the parameter $ is an interior point of an inter - 



46 


ABRAHAM WALD 


val A satisfying the conditions 1—4, then the maximum likelihood equation (1.9) 
has a root 1 which is a consistent estimate of 6. Furthermore , any root of (1.9) 
which is a consistent estimate of 6 is also asymptotically efficient at least in the wide 
sense . 

Since the maximum likelihood estimate is a root of (1.9), it follows from the 
above theorem that whenever the maximum likelihood estimate is consistent, 
it is also asymptotically efficient at least in the wide sense. 

REFERENCES 

[1] H. CramAr, Mathematical Methods of Statistics , Princeton Univ. Press, 1946. 

[2] C. R. Rao, “Information and the accuracy attainable in the estimation of statistical 

parameters’ 1 , Bull. Calcutta Math. Soc., Vol. 37 (1946). 

1 The probability that (1.9) has at least one root converges to unity as n - 



DISTRIBUTION OF A ROOT OF A DETERMINANTAL EQUATION 

By D. N. Nanda 

Institute of Statistics , University of North Carolina 

1. Summary. S. N. Roy [2] obtained in 1943 the distribution of the maxi¬ 
mum, minimum and any intermediate one of the orots of certain determinantal 
equations based on covariance matrices of two samples on the null hypothesis 
of equal covariance matrices in the two populations. The present paper gives 
a different method of working out the distribution of any of these roots under 
the same hypothesis. The distribution of the largest, smallest and any inter¬ 
mediate root when the roots are specified by their position in a monotonic ar¬ 
rangement has been derived for p = 2, 3, 4, and 5 by the new method. The 
method is applicable for obtaining the distribution of the roots of an equation of 
any order, when the distributions of the roots of lower order equations have been 
worked out. 


2. Introduction. If x — || x f| || and x* = || x* || are two p-variate sample 
matrices with n x and n 2 degrees of freedom respectively, and S = xx'/^i and 
S* = are the covariance matrices which under the null hypothesis are 

independent estimates of the same population covariance matrix, then the joint 
distribution of the roots of the determinantal equation |A— 6(A+ B)| = 0 
where A = n\S and B = n 2 S* has been obtained by Hsu [1] in 1939. The dis¬ 
tribution densty is 


*(i, * p) 


( 1 ) 


Jl 2 


M 


l + p + v + i 









n (Si - »d, 

i<7 


(0 £ 0 , £ 0|-1 ^ • • • 01 £ 1 ), 

where l = min. (p, ni), p = | p — n x | + 1, and v = n% — p + 1. 

This formula also gives the joint distribution of the squares of canonical cor¬ 
relations on the null hypothesis, that the two sets of variates are independent 

m. If 


"sn 

••• xuT 



"wn • 

•• WuT 

Xll 

••• XtN 



Wzi • 

*• W2AT 


and 

w == 


• • <. 




Lrtpi 

H 

V; 

1_ 




•• IflfjJ 


47 










48 


D. N. NANDA 


are the observations on the two sets of canonical variates and the x’s are nor¬ 
mally distributed, independently of the w’s, then the equation for the canonical 

roots is | VxioVZlVwx — 07** | = 0, where 0, = rj and V xv > = XW etc. 

It is observed that V xw VZ>lV W x is like A with n\ = q and 7,* — 7* w 7^i7«,* is 
like B with = N — q — 1 and the above equation is reduced to the form 
| A — 8(A + B) | = 0. It is under this condition that R(l, v ) gives the joint 
distribution density of r\ y r\, • • • , r\ where l — min. (p, q), m = | p — q | + l y 
and p = N — p — q. 

3. Notation and preliminaries. 

(a). Let 

H (^.* — = {1, 2, 3, . 

<<j 

It is known that the value of the Vandermonde determinant 


1 

1 

1 * 

•• 1 

0i 

02 

03 * 

•• 0, 

el 

0l 

0» • 

•• 0 s , 

. 

el I -1 


0j -1 • 

•• 0!' 1 


is equal toll (0* — 0j) = (—1)*{1, 2, 3, ••• , Z}. 

Then 

1 1 1 

0! 0, 03 - (02 - 0l)(03 - 0 2 )(03 - 0l) - - {1, 2, 3}, 

til el el 

but the determinant can also, by expansion in minors of the first row, be ex¬ 
pressed as 

-I«tfc{l, 2} + 0 2 0»{2, 3} + 0,0i{3, 1}] 

where 

e 1 - 0 2 = { 1 , 2 }. 

Hence 

(2) (1, 2, 3} = MU, 2} + 0»0i{3,1} + M>{2,3}. 

Similarly 

{1, 2, 3, 4} - *,#»«.{ 1, 2, 3} - 040A{4, 1, 2} 

+ 0,M{3, 4, 1} - 0,0,0 4 {2, 3, 4}, 


( 3 ) 






ROOT or A DBTERMINANTAL EQUATION 


4 » 


and 

{1,2,3,4,5) = fefefefe{l,2,3,4} + fefefefe{5,1,2,3| + Wife{4,5,1,2} 
(4) 

+ fefefefe{3, 4, 5, 1) + fefefefe{2, 3, 4, 5}. 

It is seen that in the successive terms the 0’s are present in a decreasing order, 

(b). Let 

(a, b ; to, a) = y m (1 - y) n |* = b*(l - b) H - o”(l - a)', 

and 

(a, 1,6; m, n) = jf y"(l - y)* dy; 

then 


(4, a) 


(a, 1 , 6; to + 1 , n) 


(o, b; to + 1, n + 1) , to + 1 
wi + w + 2 tR + u + 2 


(a, 1, b; to, n) r 


by a combination of the transformations obtained by partial integration and by 
breaking up (1 — y) n+1 into (1 - y) H —y( 1 — y) H . 

(c) Let 


(a, 2,1, b; to, n) = [ (fefeTU - fe) n ( 1 - fe)*{ 1, 2j dfe dfe 

(a, 2, b, 1, c; ♦»,«) = / (fefe)"( 1 - fe)*(l - fe)"{ 1, 2 j dfe dfe, 

•»«<#,<K#i<« 

and 


(o, 3,6, 2 , c, 1 , d; to + 1, n) 

- [ <fcW**(l - fe)*(l - fe)*(l - fe)*{ 1,2,3} dfe dfe dfe. 

(d) Let 

Tf—rftO - /V(i -v)'g(v)dy. 


tVp 

^""(O, y; A, i) - (a, 1, b; to + A; n + Z), (A > 0) 

and 

1, c; A, 0 - (o, 1,6; to, n)(b, 1, c; A, Z). 

With these preliminaries we proceed to derive the distribution of the roots. 



50 


D. N. NANDA 


4. Distribution of the largest root. Let us suppose that the roots are arranged 
in decreasing order such that for l roots we have 

0 < 61 < 

If the distribution density R(l, n, v) given by (1) be expressed as 

R(l, m, n) f* C(l, m, ») A «? II (1 - «.)" II “ *,), 

<-1 *-l *<? 

then the distribution of the largest root in the general case would be given by 
Pr(6i g x) = C(Z, m, n)(0, Z, Z — 1, • • • , 2, 1, a; m, ft). 

Now we shall derive the distribution of the largest root for Z = 2, 3, 4, and 5. 
(a) Z = 2. 

Pr(0i £ x) = C( 2; m, ft)(0, 2, 1, x; w, n). 

(0, 2,1, s; m, n) = [ (0 1 e 2 ) m (l - 0i) w (l - 02 ) n { 1, 2| d 0 r </ 0 2 

«'O<0 1 <0 1 <* 


-l 

-f. 


[ 0"(1 - 9j) B or(l - 0i) n {l, 2} dM 

e?(i - e 2 )"er +1 (i - e^ddiddi 




0“(1 - tfa)“er + ‘(l - fli)" ddi dOi . 

J o<e l <e t <x 

The limits in the successive integrals are to be so adjusted as to keep the inte¬ 
grand same. Then using the notation given in section 3(d) and equation (4, a). 

(5) (0, 2,1, x; m, n) = TT’ B (y, 1, x; m + 1, n) - TS- m ’ B (0,1, y, m + 1, n) 

or 

In O __^ _ rn*m,n f (y, X\ 7TI + 1, It + 1) Ttl + 1 ,_^ 

(O' 2> !> *»«»»)- l -rn + »"+2- + m+V+2 (y ’ l ’ *’ m> n) 

- + « +1) _ ro, 1, *, „)]. 


m + n + 2 m + n + 2 

Now by a change in the order of integration, 

TJ im ’*[(0,1, y; m, n) - (j/, 1, x; m, n)l = 0. 

Therefore 

(m + n + 2)(0,2,1, x; m ,») = 2o : ’ B ’ B [2(0, y\m + 1, n + 1) 

- (0, y; m + 1, n + D] 

= 2(0, 1 , x; 2m + 1 , 2n + 1) 

- (0, x; m + 1, n+ 1)(0, 1, x; m, n). 



ROOT OF A DBTERMINANTAL EQUATION 


51 


Hence 
Pr(B i ^ x) 

=* 2m + 1 , 2n + 1) — (0, a;; m + 1 , n + 1)(0, l,x;m, n)] 

W T W T ^ 

- w»){ jt+I+ 2Jp0 -»)“ + '* 

m + n + 2 Jo y u w T 

(b) Z = 3. For this case we need certain results for l = 2 which can be easily 
obtained and are given below: 


(6) (a, 2, 1, ft; m, n) 


wi *f- ti -f* 2 


(o, l,6;2m + l,2n + l) 


1 


m n 2 


[(0, a;m + 1, n + 1) + (0, b\m + 1, n + 1)] X (*, 1,5; to , n) 


, [-(0, a; m + 1, n + 1)(6, 1, c; m, n) 


and 

(a, 2, 6, l, C ; m ,n) = ^ T l +2 , 

+ (0, 6; m + 1, n + l)(a, 1, c; m , n) — (0, c; m + 1, n + l)(a, 1, 6; m, n)]. 

Now 

(0,3, 2,1, x\ m, n) 


i 

i. 






0i0*fl»)"(i - fli)"(i - 0»)"(i - *m 2,3} do t do* de, 
(.8id,e,) m a - eo’d - - 0 »)"[m»{i, 2} 

+ 0»0i{3, 1} + 0a0>(2, 3JJ dOiddtdit 


(using equation (2)) 


L 




0T(1 - 0,)"(01«"(1 - 0i)*(l - 02)*! 1, 2} d0ad0, 


/ +/ 

<#*<#*<» *0 


'0 <01 <••<•«<« <0f <#1 <•» <« 


(0, 3, 2,1, x; OT, n) = Tf m,n (y, 2, 1, x; m + 1, n) 

+ 1, y, 2, x; m + 1» ») 

+ Tf m,H (0, 2,1, y; m + 1,»), 


or 



52 


D. N. NANDA 


but the O’a are to be always arranged in the same order, hence 
( 0 ,3, 2 , 1 , x; to,») = T» m,n (.y, 2 , 1 ,x;m+ 1 , n) 

- rj !M -"( 0 , 2 , y, 1 , x; m + 1 , n) 

+ 7?' m ' n (0, 2 , l,y;m+ l,n). 

Using equations ( 6 ) ahd (7), we have 
(0, 3, 2, 1 , x; m, n) 

|B 

“ - t~ ■ o {2(y, l,x;2m + 3, 2 n + 1 ) - (y, l,x;m + 1, n) 

X [(0, y;m + 2,n + 1) + (0, x; m + 2, n + 1)] 

- (0,1 ,x;m +1,n)(0, y;m + 2,n + l) + (0, l,y;m + l,n)(0, x;m + 2,n + 1) 

+ 2(0,1, y;2m + 3,2n + 1) — (0,1, y;m + 1, n)(0, y; m + 2, n + 1)} 

= — —■ —-r-x {2(y, l,x;2m + 3, 2n + 1 ) + (0, 1 , y; 2m + 3, 2n + 1) 

m -+- ti + o 

- ( 0 , m + 2 , n + 1 )[( 0 , 1 , s; m + 1 , n) + ( 0 , 1 , y\m + 1 , n) 

+ ( y t 1 , x; m + 1 , n)] — ( 0 , x; m + 2 , n + 1 ) 

[(», 1 , x; to + 1 , «) - ( 0 , 1 , 3 /; m + 1 ,»)]} 

- ~ {2(0,1, *; 2 m + 3, 2 a + 1 ) 

to + n + 6 

- 2 ( 0 , y;m + 2,n + 1 )( 0 , 1 , x; m + 1 , n) 

- ( 0 , x;m + 2 , n + l)[(y, 1 , x;m + 1 , n) - ( 0 , 1 , y;m + 1 , n)l|. 

Using equation (5), we have 

(0,3, 2,1, x ; m, n) = + * + 3 ) {2(0,1, x; 2m + 3,2n + 1)(0,1, x; to, n) 

— 2 ( 0 , 1 , x; 2 to + 2 , 2 n + 1 )( 0 , 1 , x; to + 1 , n) 

- ( 0 , x; to + 2 , n + 1 )( 0 , 2 , 1 , x; m, »)}. 

Hence 

Pr(O t (^ T n’-^i) * 2 (°’ lj x; 2m + 3 > 2n + D(0,1, x; to, n) 

( 8 ) — 2 ( 0 , 1 , x; 2 to + 2 , 2 n + 1 )( 0 , 1 , x; to + 1 , n) 

- ( 0 , x; to + 2 , w + 1 )( 0 , 2 , 1 , x; to, n) j. 
(c) 1 — 4. In order to determine (0,4,3,2, 1, x; m, n) we need the values of 



ROOT OF A DETERMINANTAL EQUATION 


63 


(a, 3, 2,1, 6; m, n), (a, 3, b, 2,1, c; m, n) and (o, 3, 2, b, 1, c; m, n), which are 
ob tained according to the procedure given above. 

Now 


(0,4,3,2, l,x;m,n) = £ 


= / 
J 0< 


era - 6i)'(6 1 e t e t ) m 

t 

-«,)“(! — fl.ru, 2 

0T(1 — Ba)* (QiOtdt)* 


(1 - 0i) w (l - 6 2 ) n (l ~ «s) n (l,2,3,4} detdStddsi 


‘ (1 - 0i)"(l - e») n (l - e > ) n [9i9i9j{l. 2, 3} 

- Mi0s{ 4, 1,2} + 03Ml!3,4,1} - 020*04(2,3,4j] fadefdB, d$ t 


-I 


to<9 4 <e i <9 i <9i<3> 


07(1 - 0<)"[(0i0i0() 


,*•+1 


(1 - 0i) B (l - 02)’(1 -«.)*]{!, 2,3) 


I. 


I 




^0<tfi<®4<*|<®2<* J0<9t<9i<94<9t<x 

_ 3 ^ 2 , 1 , x; m + 1, «) — To' n ' n (0, 1, y, 3, 2, 6; m + 1> ») 

+ 7o"’" ,n (0, 2,1, y, 3, x; m + 1, n) — To' ’ (0, 3, 2,1, y\ wi + 1, n ) 

= Tf m ' n (y, 3, 2, 1 , x; m + 1, n) - Tf m ' n {0, 3, y, 2,1, b; m + 1,») 

_j_ 3 ( 2 , y, 1 , x; m + 1 , w) — TJ :m '’ , ( 0 ,3, 2 , 1 , y, m + 1 | n). 

Using the results of (a, 3, 2,1, b; m, »), (a, 3, b, 2,1, c; m,») and (o, 3, 2, b, 1, 
c; m, n), we have Pr(0i ^ x) equal to 

, C(4, m, n) 

C( 4,», «)(0,4, 3, 2,1, x; m, n) = m + n ^T 4c 
• ^2(0,1, x; 2m + 5, 2n + 1)(0, 2,1, x; m, n) 


(9) 


_ 2(0,l,x;2m + 4,2n + l) [2(Q - 2 m + 2 2n + i) 
(m + n + 3) 


- (0, x; m + 2, n + 1)(0,1, x; m, n) + (m + 2)(0,2,1, x; m, «)] 

+ 2(0,1, x; 2m + 3,2n + 1)(0,2,1, x; m + 1, n) 

- (0, x; m + 3, n + 1)(0,3,2,1, x; m,»)}. 

(d) l — 5. In the evaluation of the distribution of the largest root for 1 = 5 
the following parts need to be calculated: 

(o, 4,3,2,1, b; m,«), («, 4, b, 3, 2,1, c; m,»), (a, 4,3, b, 2,1, c; m,»), 

(o, 4,3,2, b, 1, c; *»,»). 



54 


D. N. NANDA 


Proceeding along the lines indicated in the previous sections we get 

Pr(fit g *) - [ 2 (°> x > 2m + 7, 2n + 1 )( 0 , 3, 2, 1 , x; m, n) 


( 10 ) 


2(0,1, z; 2m + 6, 2n + 1) 


{2(0,1, x; 2m +4, 2n + 1)(0, l,x;ro,n) 


(m + n + 4) 

— 2(0,1, z\2m + 3, 2n + 1)(0, 1, x; m + 1, n) 

- (0, x\ m + 3, n + 1)(0, 2,1, x; m, n ) 

i q o i /w, m m _l_ 2(0, 1, Xf 2m + 5, 2 n + 1) 

+ (w + 3)(0, 3, 2, 1, x\m, n )} i- .^ - 

•^2(0,1, x; 2m + 5, 2n + 1)(0,1, x; m, n) 

— 2(0,1, x\2m + 3, 2n + 1)(0, 1, x; m + 2, n) 

(0, xy m + 3, n + 1) 0 i 0 0 i 1 ^ 

- (m + 7 r + ~ 3j. [2(0> 113:5 2m + 2 ’ 2n + « 

- (0, x; m + 2, n + 1)(0,1, x; m, n ) 

+ (»» + 2)(0, 2,1, x; w, n)]| 

— 2(0, 3, 2, 1, x; m + 1, n)(0,1, x; 2m + 4, 2n + 1) 

— (0, x; m + 4, n + 1)(0, 4, 3, 2, 1, x\ m, n)J. 

It is evident now that the above method can be used to derive the distribution 
for any value of l. 


5. Distribution of the smallest root. Let Pr[6 1 ^ x/n, v\ = P{x/n, v) where 
9i is the largest root. Let us make the following transformations in the R{1, m, *) 
distribution: 


n = 1 - 0t 


r 2 = 1 — di-i 


ri = 1 — 0i; 

thensince0 < $i < 0j-i < • • • < 0i < 1, we have 0 < u < • • • < 



ROOT OF A DETERMINANTAL EQUATION 


55 


ri < 1, and thus the domain of integration does not change. Hence the joint 
distribution of the r’s can be expressed as 

C(l, w 9 id ri (rr M rid - II (ri - r y ), 0 < r> < ■ • • < n < 1. 

<-i <-i *<i 

Thus the r’s have the same distribution as the 0’s, but n and v are inter¬ 
changed. Therefore 

Pr(0i < x) = Pr( 1 — n ^ x) = 1 — Pr(ri 1 — ac) 

= 1 -P(l 

Hence, for getting the distribution of the smallest root, we have to change x 
into 1 — x and interchange m, n in the distributions of the largest roots and sub¬ 
tract the resultant probability from 1. The distributions for the smallest root 
are given below for l = 2, 3, 4 and 5. 

(i) l - 2. 


Pr(0a < x) = 1 — Pr(0i ^ 1 — z/n, m) 

(ID = l _ C(2, n, m) 


m + w + 2 


{2(0,1,1 - 2/i + 1, 2m + 1) 


— (0, 1 — x, n + 1, m + 1 )( 0 , 1, 1 — x, n , m)}. 

(ii) Z = 3. 

PM S *) - 1 - 12(0,1, 1^; an + 3,2m + 1) 

m •+- n *+• o 

(12) -(0,1,1 — x;»,m) 

— 2(0,1,1 — x; n + 1, m)(0,1,1 — x; 2n + 2,2m + 1) 

— (0,1 — x; n + 2, m + 1)(0,2,1,1 — x; n, m)}. 


(iii) l = 4. 


Prfo g x) = 1 - - j > ”’xl ( 2 (0 > 1,1 - x;2n +5,2m + 1) 
m + n + 4 


(0,2,1,1- x;n,m) 


( 13 ) 


- 2(0,1,1 - x, 2n + 4,2w + l ) f2(0 1 _ x;2n + 2,2m + 1) 


(m + n + 3) 


- (0, 1 - x; n + 2, m + 1)(0, 1,1 - x; n, m) 


+ (« + 2)(0, 2,1,1 - x; n, m)] 


+ 2(0, 1, 1 - x; 2n + 3, 2m + 1)(0, 2, 1, 1 - *; n + 1, m) 
- (0,1 - x; n + 3, m + 1)(0, 3, 2,1,1 - x; n, m)V 



56 


D. N. NANDA 


(iv) 1 — 5. 

Pr(9 t <5 x) = 1 - [2(0,1, r^; 2n + 7,2m + 1) 


(14) 


2(0, 1, 1 — a;; 2ft Hh 6, 2m -f- 1) 
(m + n + 4) 


•(0, 3, 2,1, 1 - s;n, m) 


{2(0, 1 , 1-x;2ti + 4, 2m + 1) 


• (0, 1, 1 - x; ft, m) 


— 2(0,1,1 — z; 2n + 3, 2m + 1)(0,1, 1 — x; n + 1, m) 


(m + n + 4) 


— (0, 1 — x; ft + 3, m + 1)(0, 2, 1, 1 — x; ft, m) 

2 n + 5, 2 m + 1) 


+ (ft + 3)(0, 3, 2,1, 1 — x;ft, ?m)| 

2(0, l.I^i; 2n + 5, 2m + 1) { 2(0> j 


(0, 1, 1 — x; ft, m) 
—[2(0, 1, 1 — xj 2n + 3, 2m + 1)(0, 1, 1 — x; ft + 2, m ) 

(0 f h — x; ft + 3, ra + 1) 


On + n + 3) 


[2(0, 1, 1 — x; 2n + 2, 2wi + 1) 


— (0, 1 — x\ n + 2, m + 1)(0, 1, 1 — x; ft, m) 
+ (ft + 2)(0, 2, 1, 1 — x; ft, w)]| — 2(0, 3, 2, 1, 1 — x; ft + 1, m) 

• (0, 1, 1 — x; 2n + 4, 2 m + 1) 
— (0, 1 — x; n + 4, m + 1)(0, 4, 3, 2, 1, 1~ x; n, m) J. 


6. Distribution of any intermediate root. 

(i) l = 3. 

Pr(0j <Tx) = Pr(0 < 6* < 0 2 < 6i < x) + Pr(0 < 0* < 0 2 < x < 6J 
= C(3, m, n)[(0, 3, 2, 1, x; ra, ft) + (0, 3, 2, x, 1; ra, ft)] 
as the*two*probabilities are independent, or 

Pr(0i £ x) = (7(3, m, n)[(0,3, 2,1, x; ra, n ) + (0,3, 2, x, 1, z; ra, ft)], where z — 1 

= OT + n'+3 { 2 ^ 0, 2m + Z ’ 2n + W 0 - !>*5 m > «) 

- 2(0,1, x; m + 1, n)(0,1, x ; 2m + 2, 2n + 1) 

- (0, x; m + 2, n + 1)(0, 2, 1, x; m, n) 

(15) +' L (x, 1, z; m, n)[2(0, 1, x; 2m + 3,2 n+ 1) 

- (0, x; m + 2, n + 1)(0, 1, x; m + 1, n)] 

- ( x,a;OT + 2 - n + U [2(0,1, x; 2m + 1, 2* + 1) 

wtftT« 

— (0, x;m + 1, n 4- 1)(0, l,x;m, n)l 
+ (x, 1, z; m + 1, n)(0,1, x; m, n)(0, x; m + 2, n + 1) 

- 2(x, 1, 2 ; m + 1, n)(0, 1, x; 2m + 2, 2» + 1)1. 



BOOT OP A DETERMINANTAL EQUATION 


57 


(ii) l = 4. 

Pr( 6 t g x) = Pr(0 < 0 4 < 0 3 < S 2 < 61 < x; m, n) 

+ Pr (0 < 04 < 0s < 02 < x < 6i ; m, n) 

= C(4, ra, n)[(0, 4, 3, 2, 1, x; m, n) + (0, 4, 3, 2, x, 1; m, n)] 

and 

Pr(0z ^ x) = Pr(0 < 04 < 03 < 0a < 0i < x\ m, n) 

+ Pr(0 < 0 4 < 03 < 0 2 < x < 6i ; m, n) 

+ Pr(0 < 04 < 03 < x < 62 < 0i; m, n) 

== C(4, m, n)[(0, 4, 3, 2, 1, x; ra, ?i) + (0, 4, 3, 2, x, 1; m, n) 

+ (0, 4, 3, x, 2, 1; m, n)]. 

. The different parts of these probabilities can be evaluated as indicated in sec¬ 
tion 4(d). Thus the method already indicated to obtain the distribution of the 
largest root also gives the distribution of any one of the roots. 

7. Further problems. It is intended to prepare the probability distribution 
tables for small values of l. The results obtained in this paper are found to be 
useful in finding the distribution of the sum of the roots when the numbers of 
canonical variates in two sets differ by one. This problem is, however, being 
investigated further. 

Acknowledgements. The author is highly indebted to Dr. P. L. Hsu for sug¬ 
gesting the problem and for guiding in this research, and is also thankful to Dr. 
Harold Hotelling for his suggestions and help in this work. 

REFERENCES 

111 P. L. Hsu, “On the distribution of roots of certain determinantal equations”, Annals of 
Eugenics , Vol. 3 (1939), pp. 250-268. 

12] S. N. Roy, “The individual sampling distribution of the maximum, the minimum and 
any intermediate of the ‘p’-statistics on the null hypothesis”, Sankhyd, Decem¬ 
ber, 1943. 



A ^-SAMPLE SLIPPAGE TEST FOR AN EXTREME POPULATION 

By Frederick Mosteller 
Harvard University 

1. Summary. A test is proposed for deciding whether one of k populations 
has slipped to the right of the rest, under the null hypothesis that all populations 
are continuous and identical. The procedure is to pick the sample with the larg¬ 
est observation, and to count the number of observations r in it which exceed all 
observations of all other samples. If all samples are of the same size n, n large, 
the probability of getting r or more such observations, when the null hypothesis 
is true, is about k l ~ r . 

Some remarks are made about kinds of errors in testing hypotheses. 

2. Introduction. The purpose of this paper is to describe a significance test 
connected with a statistical question called by the present author “the problem 
of the greatest one.” Suppose there are several continuous populations f(x — aj), 
f(x — 02 ), * • • , J(x — afc), which are identical except for rigid translations or 
slippages. Suppose further that the form of the populations and the values of 
the o< are unknown. Then on the basis of samples from the k populations we 
may wish to test the hypothesis that some population has slipped further to the 
right, say, than any other. In other words, we may ask whether there exists an 
a* > max (<ti, Oq , • " , a»_ 1 , a^i, ■ • • , a k ). From the point of view of testing 
hypotheses, the existence of such an a, is taken to be the alternative hypothesis. 
A significance test will depend also on the null hypothesis. We shall take as the 
null hypothesis the assumption that all the a’s are equal: a\ = az = • • • = a k . 

Using these assumptions it is possible to obtain parameter-free significance 
tests that some population has a larger location parameter (mean, median, quan¬ 
tile, say) than any of the other populations. 

The problem of the greatest one is of considerable practical importance. 
Among several processes, techniques, or therapies of approximately equal cost, 
we often wish to pick out the best one as measured by some characteristic. 
Furthermore, we often wish to make a test of the significance of one of the 
methods against the others after noticing that on the basis of the sample values, 
a particular method seems to be best. The test provided in this paper allows 
an opportunity for inspection of the data before applying the test of significance. 

The proposed test has the advantage of being rapid and easy to apply. How¬ 
ever, the test is probably not very powerful, and in the form presented here, the 
test depends on having samples of the same size from each of the several popula¬ 
tions. The equal-sample restriction is not essential to the technique, but since 
no very useful way of computing the significance levels for the unequal-sample 
case is known to the author, it does not seem worthwhile to give the formulas. 
They are easy to write down. 


58 



A k~ SAMPLE SLIPPAGE TEST 


59 


3. The test. Suppose we have k samples of size n each. It is desired to 
test the alternative hypothesis that one of the populations, from which the 
samples were drawn, has been rigidly translated to the right relative to the re¬ 
maining populations. The null hypothesis is that all the populations have the 
same location parameter. 

The test consists in arranging the observations in all the samples from greatest 
to least, and observing for the sample with the largest observation, the number 
of observations r which exceed all the observations in the k — 1 other samples. 
If r > ro we accept the hypothesis that the population whose sample contains the 
largest observation has slipped to the right of the rest and reject the null hypoth¬ 
esis that all the populations are identical; instead we accept the hypothesis 
that the sample with the largest observation came from the population with the 
rightmost location parameter. If r < r 0 , we accept the null hypothesis. 

The statements just made are not quite usual for accepting and rejecting 
hypotheses. Classically one would merely accept or reject the hypothesis that 
the a,- are all equal. The statements just made seem preferable for the present 
purpose. 

Example. The following data arranged from least to greatest indicate the 
difference in log reaction times of an individual and a control group to three 
types of words on a word-association test. The differences in log reaction 
times have been multiplied by 100 for convenience. Longer reaction times for 
the individual are positive, shorter ones are negative. Does one type of word 
require a shorter reaction time for the individual relative to the control group 
than any other? 


Concrete 

Abstract 

Emotional 

-6 

— 16 

-6 

-6 

-11 

-5 

—5 

-3 

-3 

-5 

-2 

-2 

-4 

-2 

-1 

—3 

-1 

0 

-1 

-1 

1 

0 

1 

3 

0 

1 

5 

3 

1 

12 

9 

8 

13 

11 

10 

13 

12 

16 

15 

29 

20 

28 


Here we have fc = 3 samples of size n — 14 each! We note that the Abstract 
column has the most negative deviation, —16, and that there are two observa¬ 
tions in that column which are less than all the observations in the other col¬ 
umns. Consequently r * 2. Under the null hypothesis the probability of ob- 



60 


FREDERICK MOSTELLBR 


taining 2 or more observations in one column less than all the observations in 
the others is about .33, so the null hypothesis is not rejected. 


4. Derivation of test. Suppose we have k samples of size n, all drawn from 
the same continuous distribution function f(x). Arranging observations within 
samples in order of magnitude the samples 0< are: Oi: Xn , Xu , • • • , X\ n ; 0*: 

Xn , Xn , • • • , Xin ; • • • ; 0* : &u »#*»>* " , Xkn . 

If we consider some one sample 0,, separately, we can inquire about the 
probability that exactly r of its observations are greater than the greatest ob¬ 
servation in the other k — 1 samples. 

The total number of arrangements of the kn observations is 


( 1 ) 


T - 


(i kn)\ 

w 


The number of ways of getting all n observations of 0* to be greater than all 
observations in the remaining samples is 


( 2 ) 


N(n) = 


[(A; - l)n]! 
(n I)* -1 0! * 


The number of ways of getting exactly n — 1 observations of 0,- greater than 
all observations in the remaining samples is 


(3) 


«•/_ _ _ [(& ~ 1)» + U! _ K* ~ !)»]! 

V ’ fn!)*-»l! (re!)*-»0! ’ 


More generally, the number of ways of getting exactly r — n — u of Oi to be 
greater than all other observations in the remaining samples is 


(4) 


»r/ ^ .a _ [(* - 1 )n + m]! [(fc - l)n + u - 1]! 

K ’ (n!)»-»«! («!)*-»(« “ 1)! ' 


Therefore the number of ways of getting a run of r = n — u or more observations 
in Oi greater than the rest is just 


(5) 


S(n - u) = E N(t) 

tmmn—V 


K k - 1 )n + u]l 


However we do not choose our sample 0, at random or preassign it, as the 
demonstration has thus far supposed. Instead we choose that Oi which has 
the greatest observation in all the samples. This condition requires us to mul¬ 
tiply S(n — u) by the factor Ac. Consequently the probability that the sample 
with the largest observation has r = n — u or more observations which exceed 
all observations in the other k — 1 samples is given by 

p/ ^ — kS(r) A:(n!) (kn — r)! 
kr)mm T (kn)\ (n - r)!' 


(6) 



A ^-SAMPLE SLIPPAGE TEST 


61 


As an incidental check we note in passing that 

Pq\ ^ k(n\) (Jen “1)! _ kti _ ^ 

V ; “ (kn)\ (n - 1)! "" *n * l# 

We note that equation (6) may be rewritten as 

(7) p(r) - kciir/ctr, 

which is a useful form for some computations. 

Table I gives the probability of observing r or more observations in the 
sample with the largest observation, among k samples of size n, which are more 
extreme in a preassigned direction than any of the observations in the remaining 
k — 1 samples. 

6* Approximations. If we use Stirling's formula and approximations for 
(1 + a) r , for small values of a and r, we can write an approximation for equation 
(6) for large values of n with r and k fixed as follows 

For very large n equation (8) yields 

(») 

which is the value given in Table I for n = oo For many purposes the result 
given by equation (9) is quite adequate, as a glance at Table I will indicate. 

6. Kinds of errors. In tests such as the one being considered here the classical 
two kinds of errors are not quite adequate to describe the situation. 

As usual we may make the errors of 

I) rejecting the null hypothesis when it is true, 

II) accepting the null hypothesis when it is false. 

But there is a third kind of error which is of interest because the present test of 
significance is tied up closely with the idea of making a correct decision about 
which distribution function has slipped furthest to the right. We may make 
the error of 

III) correctly rejecting the null hypothesis for the wrong reason. 

In other words it is possible for the null hypothesis to be false. It is also pos¬ 
sible to reject the null hypothesis because some sample 0% has too many ob¬ 
servations which are greater than all observations in the other samples. But 
the population from which some other sample say Oj is drawn is in fact the right¬ 
most population. In this case we have committed an error of the third kind. 

When we come to the power of the test under consideration we shall compute 
the probability that we reject the null hypothesis because the rightmost popula¬ 
tion yields a sample with too many large observations. Thus by the power of 



62 


FREDERICK HOSTELLER 


TABLE I 

Probability of one of k samples of site n each having r or more observations larger 
than those of the other k — 1 samples 

k -2 fc- 3 


V 
.» \ 

2 

3 

4 

5 

6 

3 


.036 




5 

.286 

.066 

.011 

.001 


7 


.079 

.018 

.003 




.089 

.023 

.005 

mm 

15 

.318 

.096 

.027 

.007 

.0018 


.322 

PI 

mm 

.009 

.0023 

25 

.324 


.031 

.009 


00 

.333 

.ill 

.037 

.012 




k - 

5 



\ 

2 

3 


B 

■ 

» \ 




■ 


3 

.143 

.011 

• 



5 

.167 

.022 

.0020 

.0001 


7 

.177 

.027 

.0033 

.0003 


10 

.184 

.031 

.0046 

.0006 


15 

.189 

.034 

.0056 

.0008 


20 

.192 

.035 

.0062 

.0010 


25 

.194 

.036 

.0065 

.0011 


00 

.200 

.040 

.0080 

.0016 



VT 

.\ 

2 

3 

4 

5 

6 

3 


.100 




5 

.444 

.167 

.048 

.008 


7 

.462 

.192 

.070 

.021 

.005 


.474 

.211 

.087 

.033 

.011 

15 

.483 

.224 

.100 

.042 

.017 

20 

.487 

.231 




25 


.235 




00 



.125 





k - 

4 



\ r 
•\ 

2 

3 

4 

5 

6 

3 

.182 

.018 




5 

.211 

.035 

.004 

.0003 


7 

.222 

.043 

.007 

.0009 

.0001 

10 

.231 

.049 

.009 

.0015 

.0002 

15 

.237 

.053 

.011 

.0022 

.0004 

20 

.241 

.056 

.012 

.0026 

.0005 

25 

.242 

.057 

.013 

.0028 

.0006 

00 

.250 

.062 

.016 

.0039 

.0010 



k » 

6 




2 

3 

■ 

5 


3 

.118 

.007 




5 

.138 

.015 

.0011 

.0000 


7 

.146 

.018 

.0019 

.0001 


10 

.152 

.021 

.0026 

.0003 


15 

.157 

.023 

.0032 

.0004 


20 

.160 

.024 

.0035 

.0005 


25 

.161 

.025 

.0037 

.0005 


00 

.167 

.028 

.0046 

.0008 


































A fc-SAMPLE SLIPPAGE TEST 


63 


this test we shall mean the probability of both correct rejection and correct 
choice of rightmost population, when it exists. 

Errors of the third kind happen in conventional tests of differences of means, 
but they are usually not considered although their existence is probably recog¬ 
nized. It seems to the author that there may be several reasons for this among 
which are 1) a preoccupation on the part of mathematical statisticians with the 
formal questions of acceptance and rejection of null hypotheses without ade¬ 
quate consideration of the implications of the error of the third kind for the 
practical experimenter, 2) the rarity with which an error of the third kind arises 
in the usual tests of significance. 

In passing we note further that it is possible in the present problem for both 
the null hypothesis and the alternative hypothesis to be false when k > 2. This 
may happen when there are, say, two identical rightmost populations, and the 
remaining populations are shifted to the left. An examination of Table I will 
give us an idea of what will happen in such a case. If k = 4, we use r = 3 as 
about the .05 level. If two of the populations are slipped very far to the left, 
while the rightmost two populations are identical, in effect k = 2. In this case 
the probability of rejecting the null hypothesis is around .2. Consequently we 
accept the null hypothesis about 80 per cent of the time, and reject it 20 per cent 
of the time under these conditions. But neither hypothesis was true. 

If we carry the discussion to its ultimate conclusion we would need a fourth 
kind of error for these troublesome situations. There are still other kinds of 
errors which will not be considered here. 

7. The power of the test. It is difficult to discuss the power of a non-para- 
metric test, but in the present case it may be worthwhile to give an example or 
two. The reader will understand that although the test is called non-parametric, 
its power does depend on the distribution function. 

In the case of k samples there are two extremes which might be considered for 
any particular form of distribution function. In Case 1, we suppose that 
when the alternative hypothesis is true, k — 1 of the populations are identical 
with distribution function /(x), while the remaining distribution function is 
f( z — a), a > 0. Case 1 may be regarded as a lower bound to the power of the 
test because for any fixed distance a between the location parameters of the 
rightmost population and the next rightmost population, Case 1 gives the least 
chance of detecting the falsity of the null hypothesis. 

In Case 2, we suppose that the rightmost population is f(x — a), a > 0 as 
before, that the next rightmost population is /Or), and that the other k — 2 
populations have slipped so far to the left that they make no contribution to 
problem of the power. This is an optimistic approach to the power because it 
gives an upper bound to the power. When k = 2, Case 1 and Case 2 are identical, 
and the power is exactly the power of the test for the particular distribution func¬ 
tion under consideration. 

Case 3 which we shall not consider deals with the situation where there is more 



64 


FREDERICK MOSTELLER 


than one rightmost population, but the null hypothesis is false. It is connected 
with the fourth kind of error mentioned at the end of section 6. 

Table II gives the upper and lower bound of the power of the test for & = 3, 
r = 3, n = 3, when the distribution is uniform and of length unity. The parame¬ 
ter a is the distance between the location parameter of the rightmost distribu¬ 
tion and that of the next rightmost distribution. 

In Table III we give some points on the upper and lower bounds of the power 
of the test for the normal distribution with unit standard deviation. The param¬ 
eter a is the distance between the mean of the rightmost normal distribution and 
the next rightmost, measured in standard deviations. Again we use the case 
k = 3, r = 3, n = 3. 


TABLE II 


Power p of the test for the uniform distribution when k = 5, r = 8, n = 3. The 
distance between the midpoints of the two rightmost distributions is a 



0 

.1 

.3 

.5 

.7 

.9 

1.00 

Upper bound p u 

.05 

.09 

.23 

.46 

.73 

.96 

1.00 

Lower bound p t 

.01 

.03 

.11 

.29 

.59 

.93 

1.00 


TABLE III 


Power p of the test for the unit normal when k = 3, r = 3, n = 3. The distance 
between the means of the two rightmost distributions , measured in standard 

deviations , is a 



0 

.5 

1.0 

i 

1.5 

2.0 

2.5 

3.0 

Upper bound p u 

.05 

.13 

.26 

.42 

.58 

.71 

.87 

Lower bound pi 

. 0 ! 

.04 

.14 

.27 j 

.43 

.60 

.80 


The power of the test has been defined as the probability of correctly rejecting 
the null hypothesis and finding the sample from the rightmost population to be 
the extreme one. This raises a question about the meaning of the entries in 
Tables II and III under a = 0. When a = 0 there is no way to reject the null 
hypothesis correctly. The probabilities given are the probabilities that a 
randomly chosen sample will force a rejection of the null hypothesis. They 
represent the limit of the power function as a tends to zero. If we think of ear¬ 
marking the sample from the rightmost population and of computing the prob¬ 
ability repeatedly that that sample will have three observations larger than all the 
observations in the other sample, and then we let a tend to zero, this is the result 
we get. These values are not the significance levels. The significance level is 
.036. 



A fc-SAMPLE SLIPPAGE TEST 


65 


8. Discussion. The reader may rightly feel that the solution here presented 
to the problem of the greatest one depends on a trick. That is, it depends 
intimately on the choice of the null hypothesis. Furthermore the reader may 
feel that the choice of a\ = 02 = • • • = a*, is neither an interesting null hypoth¬ 
esis nor one which is likely to arise in a practical situation. The author has 
no quarrel with this attitude. This means that there are many other approaches 
to this problem which are worth trying. The equal-loeation-parameter case is 
one which yields easily to non-parametric methods. 

It will be noted that a useful technique has been indicated which allows one 
to examine the data before making the significance test. In general one may 
wish to set up a test function, decide which of several samples provides the ex¬ 
treme value of the function, and then test significance given that we have chosen 
that sample which maximizes the function among the k samples under con¬ 
sideration. 

9. Conclusion. There is a large class of problems grouped around “the prob¬ 
lem of the greatest one”. First it would be useful to have a more powerful test 
than the one here proposed. Second, there is the problem of deciding on the 
basis of samples whether we have successfully predicted the order of the location 
parameters of several populations. Third, there is the general problem of what 
alternatives, what null hypotheses, and what test functions to use in treating 
samples from more than two populations. It is to be hoped that more material 
on these problems will appear, because answers to these questions are urgently 
needed in practical problems. 



ON THE UNIQUENESS OF SIMILAR REGIONS 

By Paul G. Hoel 

University of California at Los Angeles 

1. Summary. Conditions are determined for insuring that Neyman *s method 
of constructing similar regions by means of sufficient statistics will yield all such 
regions when such statistics exist. 

2. Introduction. In designing tests of composite hypotheses, one encounters 
the problem of how to construct similar regions and whether the construction 
process yields all possible similar regions. Neyman has derived methods for ob¬ 
taining similar regions when the basic distribution function satisfies certain par¬ 
tial differential equations [1] and also when a sufficient set of statistics exists for 
the unknown parameters [2]. In the former case, the construction process gave 
all such regions; however the question of whether certain subregions were inde¬ 
pendent of the parameters was left unanswered. In the latter case, the indepen¬ 
dence was obvious, but the question of uniqueness was not considered. In 
obtaining sufficient conditions for the existence of a type B region, Scheffd [3] 
employed Neyman’s differential equations assumptions and methods and demon¬ 
strated that the subregions were independent of the parameters. 

The method of constructing similar regions by means of sufficient statistics is 
much simpler to demonstrate than is the method based on differential equations. 
It also has the advantage that the independence of the subregions requires no 
proof. It possesses the disadvantage that the question of uniqueness is not 
answered. This question can be answered by showing that the assumption of a 
sufficient set of statistics includes the differential equations assumption and then 
employing methods based on the latter assumption. Such a procedure would 
deprive the sufficiency method of its simplicity; consequently a relatively simple 
direct proof of uniqueness has been constructed. The method of proof also shows 
the equivalence of the two methods of constructing similar regions. 

3. Sufficient conditions for uniqueness. Consider a distribution function, 
/Or 10i, • • • , d P ), of the variable x that depends upon the v parameters 0i, • • • , 
d P . Let x \, xz , • • • , x n denote a random sample from this distribution and let 
f(x i, • • • , x n \6i , • • • , 0,) denote the distribution function of such a sample. It 
will be assumed that n > v. 

Suppose there exists a sufficient set of statistics Ti(x x , • • • , x n ), • • • , 
T,(x i , • • • , x n ) with respect to the parameters 0i, • • • , d 9 . Koopman [4] has 
shown that if the T’s are continuous and if f{x\d x , ••• , 0,) is analytic, then 
/(x|0i, • • • , $,) must be a function of the form 

(l) '(* I*»i •••;*») - exp [£ e*x* + e + x], 

66 



UNIQUENESS OP SIMILAR REGIONS 


67 


where the 0* and 0 are single-valued analytic functions of the 0’s only, and the 
Xk and X are single-valued analytic functions of x only. He has also shown that 
if m assumes its smallest possible value, then 

(2) Zwi.) - 

where the V’s are single-valued functions of the T’s. If the preceding c o nditions 
are satisfied, it follows from (1) and (2) that 

(3) /(a*,---,*, | «!,•••, e,) - exp ££ e*y* + ne + £ xcr,)J. 

Now it is known [2] that if the T’s possess continuous partial derivatives and 
are such that it is possible to introduce additional functions 7 Vh , • • • , T n which 
will make the transformation 


T\ = T\(x\ ,•••,«») 


(4) 


T n = T n (Xi , , X n ) 

one-to-one, then f(x i, • • • , x n \$i , • • • , 6,) can be written in the form 


(5) 


f(x i, • • • , x n \di , • • • , 0,) 




, 


B,)Mxi 


, Xn\T X , . . . , T,), 


where /i is the distribution function of the T’s and/ 2 is the conditional distribution 
function of the x’s for fixed values of the T’s . The function ft does not depend 
upon any of the parameters Oi , • • • , 0,. 

For the purpose of constructing similar regions, it is desirable to work with/i. 
By combining (3) and (5), /i may be expressed in the form 


( 6 ) 


fi(T\, •••, T, | ft, • • •, 8,) - exp 2 ZOkV k + nQ + H , 

.1 _ 


where H = 2X(z<) — log/* can be expressed as a function of Ti , • • • , T, only, 
and where it is assumed that ft > 0. 

The method employed by Neyman to obtain a similar region of size a is to 
build it up as the locus of subregions of size a on the “surfaces” obtained by giving 
the T’s constant values. Since the size of such a subregion is obtained by inte¬ 
grating ft over the subregion, it will depend only upon the T’s; consequently a 
subregion can be selected that will be of size a for every set of values of the T’s. 

Now consider the construction of a similar region of size a by building up the 
region as the locus of subregions of varying size rather than of constant size on 
the surfaces that are obtained by giving the T’s constant values. Let w x and w t 
be two regions of size a and let <*i(Ti, • • • , T,) and a*(Ti, • • •, T r ) denote the 



68 


PAUL G. HOEL 


sizes of the surface subregions. It will be assumed that the regions under con¬ 
sideration are such that <*i and as are obtainable from integrating /* over the sub- 
region common to wi and w? respectively and the surface determined by fixing 
the values of the T’b. The problem then is to determine whether two different 
functions, ai and a*, can yield similar regions of size a . 

Since a critical region can be obtained as the locus of subregions, a\ and a 2 will 
yield similar regions of size a only if 


(7) 




T p \ 9\ j 


e P ) dT i 


dTp — a 

U = 1 , 2 ), 


where the integration extends over the range of values of the T”s. By means of 
(6), condition (7) may be written as 


( 8 ) 


/ • • • J «, exp [£ 0*F* + hO + #] dT, ■ • • dT, = « 0 = 1, 2). 
If e nB is factored out, it is clear that condition (8) will hold only if 
J • ■ • J on exp [E e k V k + H^dT 1 ••• dT , 


(9) 


J • • • J ai exp E 0* V k + d,T\ • • • dT, 


is an identity in the 0’ s, and hence in the 0* for the region in the 0* space that 
corresponds to the region in the parameter space for which the parameters ft , 
• • • , 6p are defined. 

Now assume that n = v and that the transformation 


Vi - Fi(r„.-,w 


( 10 ) 


Vp = Vp(Ti,---,T,) 

is one-to-one. From the preceding assumptions that gave rise to (2) and (4), it 
may be shown that the V’s are continuous and possess continuous partial deriva¬ 
tives. In terms of the V% (9) may therefore be written as 

/ ••• /exp ^EOfcFjtfxdF, ... dV, 

= |... / exp [E K t <*F, ••• dV„ 

where Ki = aae H has been expressed in terms of the V’b. 

Since the parameters will be defined over intervals and 0* is an analytic func¬ 
tion of those parameters, to every region in the parameter space determined by 



UNIQUENESS OP SIMILAR REGIONS 


69 


intervals of the 6 *s there will correspond an interval for 0* throughout which 0* 
will be defined; consequently (11) will be an identity in the 0* for intervals of 
values. For every point within regions determined by 0* intervals, the partial 
derivatives of the two sides of (11) must therefore be equal, provided the deriva¬ 
tives exist and provided the 0* are functionally independent. 

If the conditions to be imposed shortly are satisfied, it can easily be shown that 
it is permissible to differentiate (11) repeatedly under the integral signs with re¬ 
spect to the 0* . As a consequence, (11) implies that for all sets of non-negative 
integers k t , • • • , k ,, 


( 12 ) 


f ... f v\ l ■ ■ ■ V/ exp ££ 0 *F* KxJ dV l • • • dV, 

= /•••/ Ff* ••• dV. 


will be an identity in the 0* for almost all values of the 0*. But (12) is equivalent 
to requiring that 


(13) 


/ ••• f Vi 1 ••• V*/g t (Vi,dV l ---dV, 
-/•■•/v*» ... V**fc(F, 


shall hold for all sets of non-negative integers ki , • • • , k ,, where gi and g 2 are 
the integrands of (11) after they have been divided by the function of the 0* ob¬ 
tained from integrating (11). Since gi and g 2 will then be non-negative functions 
of the V’s whose integrals over all values of the V’s is one, they are distribution 
functions of the V’s. If gi and g 2 possess moments of all orders and are such that 
they are uniquely determined by their moments, then condition (13) implies that 

(14) 9i(Vi, ••• , VO -fc(Vi,---, V.). 

This identity will hold for almost all values of the parameters. If the conditions 
necessary to justify (14) are satisfied, it therefore follows that 

, • • • , T,) ® ct t (Ti , • • •, T 7 ,), 

and that Neyman’s method of constructing similar regions by choosing 
a(Ti , . • • , T,) = a yields all possible similar regions of the class of regions 
being considered. 

The conditions that were imposed on f(x\6i , • • • , 0,) in order to establish 
uniqueness may be summarized as follows: The distribution function 
f(x\$i , • • • , d 9 ) is analytic and possesses a set of sufficient statistics, T\ , • • • , T, , 
with respect to the parameters B\ , • • • , 6, , that are continuous and possess con¬ 
tinuous partial derivatives. There exist one-to-one transformations of the types 
(4) and (10). The function ce zeiVi+n , treated as a distribution function of the 
V’s, possesses moments of all orders and is uniquely determined by its moments. 



70 


PAUL O. HOEL 


Finally, the 0* are functionally independent with the smallest possible value of 
H equal to r. 

If the assumption that the 0* are independent is not realized, the distribution 
function (1) could be expressed in terms of fewer than v parameters. This is 
also true if n < v. The two assumptions that n = v and that the 0* are indepen¬ 
dent will therefore be satisfied if (1) is expressed in terms of the minimum number 
of parameters. The remaining assumptions can often be checked quite easily 
whenever a particular distribution function is given. 

In deriving tests of hypotheses for certain parameters, the distribution function 
f(x\9i , • • • , $,) will of course contain those parameters in addition to the param¬ 
eters 0i, • • • , Bp , but since they will have fixed values, it was not necessary to 
introduce them into the discussion. 


4. Equivalence of methods. Although the equivalence of the two methods 
of constructing similar regions has been implied in the literature [1], no simple 
demonstration seems to be available. Such a demonstration is easily given by 
mean8of(3). Let 


log / 

dB 


where / is given by (3) with /x = v, and let 


Differentiation of (3) yields 


<PU 


d<?{ 

ddj * 


(15) 


_ _ V' ^6* v » „ 30 


<pu 


y 

1 MidOk 


Vk + n 


d 2 0 

Out Ovj 


The differential equations that are assumed to hold in the other method of con¬ 
struction [1] may be written in the form 


(16) 


<pij = A if + 23 Biir<Pr t (h j = 1» * * * , v)t 
r -1 


where the A a and B ijr are functions of the 0’s only. Upon substituting the 
values given by (15), it will be found that (16) will be satisfied if 


(17) 


5*0* _ v* d ^0* 
50*00/ h 1 * Wr ’ 


(k =* 1, •• • , v) 


a 2 e 


50 


.. = Aif + n 2 Bfr 

50*50/ * 50 r 


and 



UNIQUENESS OF SIMILAR REGIONS 


71 


Since (17) represents a set of v equations in the s, whose coefficient matrix is 
non-singular because of the functional independence of the 0*, it follows that 
sets of A’a and B 's can be found to satisfy equations (16). This shows that the 
sufficiency assumption includes the differential equations assumption. 

Now the method of constructing similar regions here consists in building them 
up as the locus of subregions of size a on the surfaces obtained by giving the <pi 
constant values. But from (15) it follows that the surface <?i = c,(t = 1, • • • , v) 
is equivalent to the surface 


y 30* 

i ddi 


TT . 30 


Cm 


" 1, * •• , v) 


which may be written in the form 

( 18 ) (t-!,••• ,r) f 


because 0 is a function of the parameters only. Since the coefficient matrix of the 
V’s in (18) is nonsingular, (18) may be solved for the F’s; consequently the sur¬ 
face ipi = d , {i = 1, • • • , v) is equivalent to the surface V t = c[ , (i = 1, • • • , v). 
But from the assumption concerning the transformation (10), the surface 
V x = cl , (i = 1, • • • , v ) is equivalent to the surface 7\ = c" , (i = 1, • • • , v). 
Tlius, the two surfaces = c,- (i = 1, • • • , v) and T x — c" y (i = 1, • • • , v) are 
equivalent and hence the two methods of constructing similar regions are 
equivalent. 


REFERENCES 

[1] J. Neym \n, “On a statistical problem arising in routine analysis and in sampling 

inspections of mass production, M Annals of Math. Slat., Vol. 12 (1941), pp. 46-76. 

[2] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,’* Roy. Soc. Phil. Trans., Vol. 236A (1937), p. 364. 

[3] H. Scheff£, “On the theory of testing composite hypotheses with one constraint,” 

Annals of Math. Stat. f Vol. 13 (1942), pp. 280-293. 

[4] B. O. Koopman, “On distributions admitting a sufficient statistic,” Trans. Amer. Math . 

Soc. t Vol. 39 (1936), pp. 399-409. 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


CONVERGENCE OF DISTRIBUTIONS 

By Herbert Robbins 

University of North Carolina 
Let /„(x) (n = 0, 1, 2, •••) be frequency functions 

(1) /»(*) > 0, ("/„(*) dx — 1. 

There are various ways in which the sequence of distributions corresponding to 
the/ n Or) (n = 1, 2, • • •) may be said to converge to the distribution correspond¬ 
ing to /o(x). The definition customarily adopted in mathematical statistics 
(see e.g. [1]) is equivalent to the condition 

(a) lim f f n (x) dx * f f 0 (x) dx for every f. 1 

n-*oo J—oo J— so 

We shall also consider the two further conditions 

(b) lim / / n ( x) dx= f Q (x) dx for every Borel set S , 

n-*oo Js "8 

and 

(c) lim / f n (x) dx =* / f 0 (x) dx uniformly for all Borel sets S. 

n—s o Jb Js 

It is clear that (c) implies (b) and that (b) implies (a). That the converse 
implications do not hold is shown by the following examples. 

Example 1. Let fo{x) = 1 for 0 < x < 1 and 0 elsewhere. Choose and fix 
any 0 < € < 1, set 6 n = c/w-2 n , and for n = 1,2,--- let f n (x) = l/n-8 n for 
i/n — 8 n < x < i/n (t = 1, 2, • • • , n) and 0 elsewhere. If we denote by S n 
the set of all x for which f n {x) > 0 it is easy to see that for n = 1, 2, * • • 

(2) 0 < j£ fo(x) dx — £ f n {x) dx < 1/n for every 

(3) [ fo(x) dx - e /2 n , [ f n (x) dx = 1. 

JSm 

1 From a well kown theorem of P61ya the convergence is then necessarily uniform for all {. 

72 



CONVERGENCE OP DISTRIBUTIONS 73 


00 

Hence for the Borel set S = £ n it follows that 

i 


(4) 

j g fo(x) dx < 23 J S ' fo(x) dx = «, 


(5) 

f f„(x) dx = f fn(x) dx = 1, 

Js Js. 

(n - 1, 2, 


From (2) we see that (a) holds (uniformly for all £), and from (4) and (5) that 
(b) fails about as badly as possible. 

This construction can be modified to apply to any fo(x) ; thus choosing fo(x) = 
(2ire x *)~ 112 we can construct / n (x) (n = 1, 2,- • •) and a Borel set S such that 

lim f f n (x) dx = “ 7 = f e~** /2 dx uniformly for all £, 

n—*oo J—oo v ij7T J— oo 

while 

^ j[ e~* ,/2 dx = .01, jf /„(x) dx = 1, (n - 1, 2, • • •). 

It is conceivable that some time a statistician, failing to consider such a possibil¬ 
ity, will be Jed to approximate .01 by 1. 

If X n is a random variable with frequency function/»(s), if y =* g(x) is a Borel 
function, and if (a) holds, then it follows from Example 1 that the distribution 
function H n (y) of Y n = g(X n ), equal to the integral of f n {x) over the set S y of all 
x such that g(x) < y, need not converge to the distribution function H a (y) of 
Y 0 = g(X 0 ). It is easily seen that this possibility is excluded if, as commonly 
occurs in applications, g(x) is such that for every y, the intersection of S v with 
any finite interval is the sum of a finite number of intervals (e.g., if g(x) = sin x). 

Example 2. Let fo(x ) be defined as in the previous example, and for n — 
1, 2, • • • let f n (x) = 1 + sin {2mx) for 0 < x < 1 and 0 elsewhere. By the 
Riemann-Lebesgue theorem it follows that (b) holds. But let S n denote the 
set of all x for which f n (x) > 1; then 

f fo(x) dx = J, f fn(x) dx = i + 1 /t, (n « 1, 2, • • •), 

•'Sn 

so that (c) does not hold. 

It follows from these examples that (a), (b), and (c) are successively stronger 
definitions of convergence. We shall now give some definitions equivalent to 
(b) and (c). 

First we recall that the non-negative, completely additive, and absolutely con¬ 
tinuous set functions 


(6) 


P n (S) = Jf/„(x) dx, 


(n = 1, 2, •••), 



74 


HERBERT ROBBINS 


are said to be uniformly absolutely continuous if for every c > 0 there exists a 
5 > 0 such that for any S and any n = 1, 2, • • • , 

(7) m(S) < 5 implies P n (S) < c. 

We shall denote the condition that the P n (S) be uniformly absolutely continuous 
by (u.a.c.), and we shall now prove that (b) is equivalent to 

(b') (a) and (u.a.c.). 

Proof. (A) Suppose (b) holds. It is clear that (a) holds, and we shall show 
by contradiction that (u.a.c.) holds also. For if not then there would exist an 
c > 0 such that for any ij > 0 we could find a set S and an integer n such that 

(8) m(S) < Vl P n (S) > €. 

Moreover, since the set function 

Po(S) = Jjoix) dx 

is absolutely continuous, there exists a 5 > 0 such that 

(9) m(S) < 5 implies P 0 (S) < e/2. 

Now by (8) there exists an Si with m(Si) < 5/2 and a fa such that Pk^Si) > e. 
Next, there exists an £* with m(iS 2 ) < 5/2 2 and a fa such that P*,(/S 2 ) > e, and 
it is easy to see that we may assume that fa > fa . Proceeding in this way we 
find a sequence of integers fa < fa < • • • and of sets <Si, S 2 , • • • such that 

(10) m(Sn) < 5/2 n , P kn (S n ) > €, (n = 1, 2, • • •). 
Let S = ]£* S n ; then by (10), m(S) < J^im(S n ) < 5, so that by (9), 

(11) Po(S) < c/2. 

But by (10), 

(12) P* n (S) > P kn (S») > e, (n = 1, 2, • • •). 

From (11) and (12) we conclude that (b) does not hold, which is a contradiction. 
Hence (b) implies (b')« 

(B) Suppose (b') holds. We shall show first that (b) holds for any set Si 
of finite measure. Choose any e > 0; then from (u.a.c.) it follows that there 
exists a 5 > 0 such that 

(13) m(S) < 5 implies P n (S) < c/8 (n = 0,1, 2, • • •)• 

It is known from the theory of measure that corresponding to Si and to 5 we can 
find a set St which is the sum of a finite number of disjoint intervals, such that 

(14) m((Sx - St) + (& - Si)) < 5. 



CONVERGENCE OF DISTRIBUTIONS 


75 


From (13), (14), and the relations 

(15) P»(&) = P n (S 2 ) + P.(& - S 2 ) - P n (& - &), (n = 0, 1, 2, • • •), 
it follows that 

_ I ft(Sl) ~ P.(Sl) I < | Po(S 2 ) - Pn(&) | + P.(S, - S 2 ) + P«(S 2 - 5l) 

(16) 

.+ Po(5l - «0 + Po(& - Si) <\ P 0 (S 2 ) - Pn(St) I + «/2, 
and from (a) that for large enough n, 

(17) | Po(S 2 ) - P„(S 2 ) | < e/2. 

Thus from (16) and (17) it follows that for large enough n, 

I PoGSi) - p.(S0 I < «, 

which proves (b) for the case m(S) < oo. 

Now given any e > 0 choose a, 0 so that, setting A = {a < x < &}, we have 

(19) P 0 (A) > 1 - e/4. 

Then it follows from (a) that for large enough n, 

(20) P«(A) > 1 - e/2. 

Then for any Borel set S we have for large enough n, 

P.(® - Po(® « P n (£A) + P n (S — A) — P(SA) - P(S - A), 

| Pn(S) - Po(® I < | Pn(SA) - Po(SA) I 4- Pn(S - A) + P Q (S - A) 

< I P n (SA) - P 0 (<SA) I + e/2 + 6/4. 

But by the previous case, since m(£A) < oo, for large enough n we shall have 
| Pn(SA) — Po(SA) | < e/4. Hence for large enough n, 

| P n (S) - P D (S) | < 6 , 

so that (b) holds in this case also. This completes the proof. 

We shall say that lim f n (x) = fo(x) in measure if for every e > 0 and for 

n-*oo 

every set A such that m(A) < oo, the measure of the set of all x in A for which 
I fn(x) — fo(x) | > e, tends to 0 as n increases. (For a space of finite measure 
this reduces to the usual definition.) We now observe that (c) is equivalent to 

(c') lim f n (x) = fo(x) in measure. 

In fact, it is easy to show that (c) is equivalent to convergence in the mean of 
order 1, 

(c") lim f | f n (x) — fo(x) | dx — 0, 



76 


Z. W. BIBNBAXJM 


which implies (c'), and a theorem of Schefite [2] states that (o') implies (c). 2 
Finally, it is not hard to show that the condition 

(d) lim } n {x) — f Q (x) almost everywhere 

n-*oo 

implies (c') but not conversely. 

Summing up, we arrive at the following complete set of implication relations 
among the various modes of convergence which we have considered: 

(20) (d) -> (c") ^ (cO (c) -> (bO *=* (b) - (a). 

REFERENCES 

[1] H. Cramer, Mathematical Methods of Statistics , Princeton (Jniv. Press. 1946, pp. 58-60. 

[2] H. Scheff£, “A useful convergence theorem for probability distributions/’ Annals of 

Math. Stat., Vol. 18 (1947), pp. 434-438. 

13] E. J. McShane, Integration. Princeton Univ. Press, 1944, p. 168. 


ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 

By Z. W. Birnbaum 
University of Washington 

The quality of a distribution usually referred to as its peakedness has often 
been measured by the fourth moment of the distribution. It is known, however, 
that there is no definite connection between the value of the fourth moment and 
what one may intuitively consider as the amount of peakedness of a distribution. 1 
In the present paper a definition of relative peakedness is proposed and it is shown 
that this concept has properties which may make it practically applicable. 

Definition. Let Y and Z be real random variables and Yi and Z\ real con¬ 
stants. We shall say that Y is more peaked about Fi than Z about Z\ if the in¬ 
equality 

P(1 Y - IT | S T) ^ P(| Z - Z l l S T) 
is true for all T ^ 0. 

If, for example, Y and Z are normal random variables with expectations Y\ 
and Zi and standard deviations a 9 and <r ,, and if <r v < <r,, then Y is more peaked 
about Y\ than Z about Z\ . Similarly, if Y is a random variable such that 
P(Y < a) = P(F >6) = 0 for a < 6, and if Z is the discrete random variable 
with P(Z = a) = P(Z = b) = J, then Y is more peaked about i(a + b) than 
Z about the same point. 

*Scheff6 actually proves that (d) implies (c), but the Lebesgue convergence theorem on 
which his proof is based holds for convergence in measure (see e.g. [3]). 

1 1. Kaplansky, “A common error concerning kurtosis,” Am. Stat. Assn. Jour. t Vol. 40 
(1045), p. 259. 



ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 


77 


Lemma. 1M Fi, F 2 , Z \, Z 2 be continuous random variables 2 with the probability 
densities (F 2 ), /i(Zi), f 2 (Z 2 ) such (had 

1°. Fi and Y 2 are independent , Zi and Z 2 are independent , 

2°. = *(- F t ) /or ad F<, /,(Z<) = /,(- Z x ) far aU Z { , (i = 1,2), 

3°. ^(F 2 ) and fi(Zi) are not-increasing functions for positive values of the vari¬ 
ables f and 

4°. Yi is more peaked about 0 than Zi , for i = 1,2. 

Let Y = Fi + Y 2 and Z = Z\ + Z 2 . Under these assumptions Y is more peaked 
about 0 than Z . 

Proof: Let $,(t/) = P(F t ^ 2/), Fi(z) = P(Z»- ^ 2), fori = 1, 2 , be the cumula¬ 
tive probability functions. For any random variables Fi , F 2 , Z\ , Zj 2 (not neces¬ 
sarily continuous) which fulfil assumption 1° we have, for any T , the relation¬ 
ships 

P(F ^ T) - P(Z ^ T) = [ [4>!(T - s)d4>2(s) - - s)dF 2 (s)] 

J— * 



- s) - Fi(T - s)]rf<^(s) 

+ f“ Fi(T - «)[<**,(«) - dF.OO] 
j—00 

- s) - Fy{T - «)]<*$*(«) 

- r [$ 2 (s) - Fi(s)]dFi(T - s) 


- s) - F^T - s)]d<* 2 («) 

+ T W.T - s) - FAT - s)}dFAs) 

J— 00 


where 


etc. 


= h(T) + IAT), 

I AT) = f” WT - s) - FAT - s)]rf* 2 (s) 

J— oo 

= f [*i(-s) - FA-s)]d$AT + s) 

J— 00 

-/'+/" 

J-eo JO 

= fViOO - *i(«)]d* 2 (r - S ) 

Jo 

+ [*i(-«) - + s)\, 


* As defined e.g. in H. Cramer, Mathematical Methods of Statistics, Princeton University 
Press, 1946, p. 169. 



78 


Z. W. BIRNBAUM 


If the random variables have distributions symmetrical about zero (assumption 
2°) this is equal to 

j[ ffP(Z» £ a) - P(Yt £ «)]dP(F, ZT-S) 



+ [P(Yi SS -a) - P(Z, £ -a)]dP(F, g P + a)J 
{[1 - P(Z t > s) - 1 + P(Y 1 > a)]dP(F, gT-i) 

+ [P(Y 1 2 a) - P(Z, 2 a)]dP(F, gr + .)) 

{[p(f, i •) - p(z, § »)MP(F, sr + «) + p(Fj s r - «)i 

- [P(F, = a) - P(Z, - a)]dP(F» 


and we obtain 


( 1 . 1 ) 


h(T) = f + ~[P(.Y 1 2 a)- P(Z t 2 a)]d[P(F* 5f + j) 

Jo 

+ p(f s s sr — •)] — j[ + ” [P(Fi =«) -p(z, = a)]dP(r, sr-i). 


By an analogous argument one derives the equality 
UT) = [ + ”° [P(F, 2 a)- P(Z 2 2 s)]dP(Z 1 ^ T + a) 

+ " [P(F, = a) - P(Z» = a)]dP(Zi g T - a). 

Making use of the assumption that iTi, y 2 , Zi, Z 2 , are continuous random vari¬ 
ables, we conclude that the second integrals in (1.1) and (1.2) are zero, and we 
may write 

(2.1) Ji(T) = j[P(F, 2 a) - P(Zx 2 a)][ w (T + a) - ^{T - s)]da, 

(2.2) • J,(T) = / [P(Fj 2 a) - P(Z S 2 a)][/ l (5T + a) - MT - s)]ds. 

*0 

For T ^ 0 we have, making use of assumption 3°, 

<Pz(T + s) - <^(T -s) gOifO^s^T 
<pt(T + s) — <p2(T — s) = <^(s + T) — ^(s — T) ^OifO^Tgs, 
and similarly 

fi(T + «) — /i(r — s) g 0 for all T ^ 0 and s I> 0. 

Since according to assumption 4° we also have 
P(Yi S •) - P(Z» ^8)^0 
P(F* s) — P(Z 2 |s) g 0 for 8 ^ 0, 


+ P(Zi £ T 





ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 


79 


both integrands in (2.1) and (2.2) are non-negative for all values of e, and we 
conclude 

P(Y £ T) - P(Z £ T) = h{T) + UD £ 0, 

and hence 

(3.1) P(Y ^ T) — P(Z £ T) £ 0 for T ^ 0. 

From assumption 2° one easily sees that Y and Z have symmetrical probability 
distributions. This together with (3.1) leads to 

P{Y ^ T) - P(Z ^ T) = P(Y g - T) — P(Z g - T) g 0, 

and thus to 

P(| r I > T) - P(| Z I ^ T) s o for T ^ 0. 

As can be seen from (1.1) and (1.2), the assumptions of the Lemma, in par¬ 
ticular the assumption that all variables are continuous and the assumption 3°, 
are rather special sufficient conditions for Y being more peaked about 0 than Z. 

Theorem 1. Let Y and Z be continuous random variables with probability 
densities <p{Y) and f(Z) such that 

1°. <p(- Y) = <p(Y) for all F,/(- Z) = f(Z) for all Z, 

2°. <p{Y) and f(Z) are not-increasing functions for positive values of the variables , 
3°. Y is more peaked about 0 than Z . 

Let Y\ , Yi , • • • , Y n and Zi , Z 2 , • • * , Z n be random samples of Y and Z, respec - 
1 1 A 

livelyi and Y n = - s, Y f . Z n = - Z,.—Then Y n is more peaked about 0 than 

n /_1 n jmml 

Zn. 

Proof. From the preceding Lemma one concludes by simple induction that 
Y f = Yi + F* + • • • + Y n as well as Z' = Z\ + Z 2 + • • • + Z n are continuous 
random variables with distributions symmetrical about zero and probability 
densities not-increasing for positive values of the variables, such that Y f is more 
peaked about 0 than Z'. From this the theorem follows immediately. 

The conjecture that assumption 2° of Theorem 1 might be superfluous is in¬ 
correct as may be seen from the following example: 

Let Y be any continuous random variable with a distribution symmetrical 
about zero and such that P(| Y | > a) =0 for some a > 0. Let Z be the dis¬ 
crete random variable with P(Z = — a) = P(Z = a) = We have for 0 ^ 
T £ a 

P(\Y\ > T) £ 1 = P(|Z| £ T), 

hence Y is more peaked about 0 than Z. If Y \, Y% and Z \, Zj are random sam¬ 
ples of size 2, we have 

P(Z% =* — a) =s P(Z* = a) == J, P(Zs = 0) = J, 

and thus 

P(| Z* | £ T) = i for 0 < T £ a. 



80 


Z. W. BIRNBAUM 


The random variable Y 2 is continuous, with a distribution symmetrical about 
zero, such that P(| Fj | ^ a) = 1. There exists, therefore, a T\ such that 
0 < Ti g a and that P(| Y 2 ] ^ Pi) = It follows that 

P(| f* I i aro - i > * - P(IZ 2 1 ^ TO, 


hence Fj is not more peaked about zero than 2 2 . The random variable Z is 
discrete, but it can be approximated by a continuous random variable with a 
U-shaped probability density, so that all the probabilities will be modified only 
very slightly and F 2 still will not be more peaked than Z 2 . Nothing will change 
in this example if one assumes that Y fulfils condition 2° of Theorem 1. 

Theorem 2. Let Y be a continuous random variable such that 
1°. <p(- Y) = <p(Y) for all Y, 

2°. <p(Y) is a not-increasing function for Y > 0, 

3°. P(| Y | > a) = 0 for some a > 0. 

1 n 

Let Yi , Y 2 , • • • , Y n be a random sample of size n and Y n = - ]C Y $. Then, 

n jmml 

for any y ^ 0, we have 

(4.1) F(|?„| 3; y) ^ 


where 

(4.2) 


.(«) = % 


n (»/S)C»+l)<*Sn 


»*©[! 


S « + 1) 



Proof. Let Z be the random variable with uniform distribution in the 
interval — 1 ^ Z s* 1. If Zi, Z 2 , • • • , Z« is a random sample, then Z' = 
Zi + Zj + • • • + Z n has the cumulative probability function* 


0, 


= 1 , 


z < —n, 
—n 2* z n, 
z > n, 


and 2» = - has the cumulative probability function 


= 0, 


P(2n g f) 


^ (-D\ t 

n! <S(»/s)(f+» \t 

1 1. 


"''jfr + l) 


-<!■ 


f < -1, 
-1 =§ r ^ 1, 
r > 1. 


* This expression is due to Laplace. For derivation and discussion, see: J. V. Uspensky, 
Introduction to Mathematical Probability, McGraw-Hill, 1937, p. 279, and Cramdr, op. cit., 
p. 245. 



RANDOM NUMBERS 


81 


Thus, 

P(| 2.1 fc 0 - 2[i - P(Z. * <)] 

- 2 {‘ _ «+»- ’J} 

and in view of the identity 


§<-»*(*) 


(u — k) n = n\ 


this becomes 


PdZ-liO-^. L (-«' 




*>/L2 




(0 


y. 


for 0 ^ 2 1. The random variable - is obviously more peaked about zero- 

Y 

than Z. Since - and Z fulfil the assumptions of Theorem 1, it follows that 


Y n . 


is more peaked about zero than 2 n , that is 


P (| a"| - 4 ) S P(l Z " I * 0 = *»(<) for t £ 0. 

Setting at = y, one obtains (4.1). 

For n —* oo the function \£„(0 approaches asymptotically the probability- 
P(\ X | ^ l\ZSn) for the normalized normal random variable X. A For » = 8- 
one obtains the following values which indicate a good approximation: 


t 

.3998 

.5254 

.6711 

p(l x 1 g; tV 24) 

.05 

.01 

.001 

*•(*) 

.049 

.0092 

.0005. 

For smaller values of 

n, ¥»(<) 

can be easily computed. 


A METHOD FOR OBTAINING RANDOM NUMBERS 

By H. Burke Horton 
Interstate Commerce Commission 

The need for large quantities of random numbers to be used in sample design, 
subsampling, and other statistical problems is well known. Tippett’s [1] num¬ 
bers have been widely used for these purposes, despite criticism directed at 
their lack of randomness. The following procedure may be of interest to those 


4 Cram4r, op. cit., p. 245. 



82 


H. BURKE HORTON 


who wish to develop their own random series. The method described below will 
ultimately be used to record extensive tables of random numbers for general use. 

Current methods of producing random numbers usually depend upon single 
operations of mechanical or electronic devices. These may be described as 
“single-stage” random number processes. The numerical results are biased to 
the same extent as the devices from which they are taken. 

At this point it is desirable to describe a process which may be called “com¬ 
pound” randomization. Assume two roulette wheels arranged in series so that 
the first controls the arrangement of symbols on the second wheel, while a turn 
of the second wheel determines which of its positions is to be observed. If the 
decimal system is used, the first wheel would have 10! “equally likely” positions, 
and the second would have 10 “equally likely” positions. If three such wheels 
were to be chained, the first would require (10!)! positions, the second 10! posi¬ 
tions, and the third 10 positions. In general, if n wheels were to be chained, 
the first would require IOC!)"" 1 “equally likely” positions. It is not practical 
to design such a machine. 1 

One method of surmounting these difficulties is to shift to the binary system 
in order to take advantage of the fact that 2! = 2; or, in general, 2(!) n = 2. 
This property makes feasible the chaining of any number of machines in series; 
and, furthermore, the machines can be of the same design. If desired, the re¬ 
sults taken from a single machine may be chained. Another important feature 
is the ease of handling binary chains by electronic systems. 

The words “equally likely” have been placed in quotation marks thus far to 
indicate that the probabilities are as nearly equal as manufacturing precision 
permits. Any simple single-stage device will have some bias, and it is this very 
lack of true equality that the chaining process is designed to meet. For con¬ 
venience we may take as our binary symbols +1 and — 1 rather than the custom¬ 
ary 1 and 0. We adhere to the usual rules regarding the sign of a product. 

Let p{ be the probability of obtaining +1 in the i tk trial (or in the i tk machine 
of a chain of machines). 0 < < 1. qi = 1 — p % represents the probability 

of obtaining — 1 in the I th trial. 

Let Pi be the probability of obtaining +1 as the product of i trials. Q< = 
I — Pi is the probability of obtaining — 1 as the product of i trials. The follow¬ 
ing relationships can be set down immediately: 

Pi — pi Qi = Qi 

Pt = Pip* + Qi-qt Qs = Prqt + Qi-p* 

Pt = PfPt + Qt*q s Qs = Pt*q % + Qi'Pi 


P* = Pi-i'Pi + Qi-vq< Qi = Pi-vQi + Qi-vPi 

1 It has been pointed out by Dr. George W. Brown that a practical solution is possible 
using any number base, n, by addition of random digits (0,1, 2, ••• n — 1) modulo n. 



RANDOM NUMBERS 


83 


We may calculate the bias, P* — J, for a chain of k trials: 

ft - 4 - 4(ft - Qk) 

= \{Pk-i'Pk + Qk-vqk — Pk-i-Qk ~~ Qt-i • p*) 

Factoring, we have 

ft 4 == 4(ft-i — Q*-i)(p* — 9 *) 

Substituting for P*_i — Q*_i and factoring again, 

ft — 4 8=8 4 (ft- 2 — OmXpw ~ 9*-0(p* - ?*) 

Continuing the process of substituting and factoring, we obtain 

ft - 4 = 4(pi - 9i)(p* - ft) * • * (Pk - 9*) 

(1) i t 1 k 

m(2 Pi -1). 

A <.l Z <_ 1 

We may write the general formula for P* : 

(2) p*-i[i+n(2 P< -i)]. 

In the special case where all the p, are equal to a constant, p, 

(3) P k = i[l + (2p - l) fc ]. 

This can also be derived directly by expansion of (p — q) k . 

If any machine, r, in the chain has no bias (p r = 4, exactly), the chain itself 
has no bias, since 2p r — 1=0. Note also that if for all t, 0 < p% < 1, the bias 
of the complete chain is less than the bias of any component (single or multiple) 
taken from the chain, because | (2p,* — 1) | < 1. Or stated another way, the 
results taken from any machine, no matter how nearly perfect, can be improved 
by chaining with another machine, no matter how biased the latter. Even in 
the limiting case, p = 1 (or 0), the magnitude of the bias remains unchanged; 
in all other cases it is reduced. The bias of final results can be made as small as 
desired by increasing the length of the chain. Compound randomization can be 
regarded as an attrition process which may be used to reduce final bias below 
any preassigned quantity. If the observations taken from two machines in the 
chain should be perfectly correlated, the only effect is to shorten the chain by 
two. 

In shifting from the binary system to the decimal system, symbol bias will be 
introduced. In general, symbol bias will be introduced in passing from a given 
positional system to any other positional system, unless one of the number bases 
is a rational power of the other. 

To illustrate, let us assume that we have a random binary series and wish to 
obtain a random one-digit decimal series. It will be necessary to tabulate the 
binary series in blocks of four symbols. The quantities will range from 0000 
(binary) to 1111 (binary), or from 00 (decimal) to 15 (decimal), with equal 



84 


H. BURKE HORTON 


probabilities. There would be no predominance of either ones or zeros in the 
overall binary tabulation, as illustrated in the table below. 



Binary System 

Decimal System 


0000 

0 


0001 

1 


0010 

2 


0011 

3 


0100 

4 


0101 

5 


0110 

6 


0111 

7 


1000 

8 


1001 

9 

Tabulation to this point 

25 zeros 

15 ones 

One of each symbol 


1010 

10 


1011 

11 


1100 

12 


1101 

13 


1110 

14 


mi 

15 

Overall tabulation 

32 zeros 

(Right digit only) 

0-5,2 each 


32 ones 

6-9, 1 each 


However, if we loot at the right digit of the decimal tabulation, it is clear that the 
symbols 0 to 5, inclusive, will occur twice as often as the symbols 6 to 9, inclusive. 
The easiest way of correcting for this bias is simply to reject all two-digit decimal 
numbers which occur, thereby giving equal probabilities to the ten decimal sym¬ 
bols. The rejection could be accomplished most easily by electronic devices 
operating on the binary numbers. All numbers greater than 1001 (binary) 
would be excluded through the operation of a simple four-stage electronic 
counter. 

This simple illustration also demonstrates the inefficiency of converting ran¬ 
dom four-digit binary numbers to random one-digit decimal numbers. 37.5% 
of the data are lost in the process of removing bias. A more efficient procedure 
would be to tabulate the random binary series in blocks of ten digits. The 
largest number that could occur would be 1 111 111 111 (binary), or 1,023 (deci- 




ERROR IN INTERPOLATION 


85 


mal). The numbers would have equal probabilities insofar as this is attainable 
by chaining. To obtain a random three-digit decimal series it would be neces¬ 
sary to reject the numbers above 999 (decimal). This would amount to only 
2.34% of the available data. As before, rejection could be accomplished easily 
in the binary series by use of a ten-stage electronic counter. 

Several promising devices are being considered for tabulating random numbers 
in accordance with the principles discussed herein. Electronic or electrical 
systems actuated by cosmic rays seem to be the most desirable. Tabulating 
equipment may be wired to turn out random numbers, possibly as a by-product 
of other card runs. 

If only a few random numbers are needed, they can be obtained by much 
simpler methods. For example, a coin may be tossed, letting heads and tails 
represent +1 and — 1, respectively The product of k successive tosses would 
be tabulated as the random binary variable. Products equal to +1 and — 1 
would be coded as 1 and 0, respectively. Blocks of binary symbols would then 
be converted to the decimal system as described above. 

REFERENCE 

[11 Tippett, L. H. C., Random Sampling Numbers , Tracts for Computers, No. 15, Cambridge 
University Press, 1927. 


NOTE ON THE ERROR IN INTERPOLATION OF A FUNCTION OF TWO 
INDEPENDENT VARIABLES 

By W. M. Kincaid 

University of Michigan 

Suppose that g is a functon of one real variable x and h is an interpolation func¬ 
tion such that g(x) — h(x) for x = x \, x% , • • • , x n . Let f(x) = g(x) — h(x) 
d n 

and suppose that ^ f(x) exists in an interval containing the points Xo , X\ , • • • , 

x n . Then the error in interpolation may be estimated from the well-known 
relation 

(1) f(x o) = (x 0 - xi)(x 0 - Xi) (x 0 - x,), 

n! 

where £ is some point in the smallest interval containing xo , x \, • • • , x n . 

In the most usual case, where h(x) is a polynomial of degree less than n, we 
have /«<*) = g M 0-). 

It is natural to consider the corresponding situation for functions of two inde¬ 
pendent real variables x and y. Let g and h be two functions such that g(x 7 y) = 
h(x, y) for n points x = x i} y = y*(i = 1,2, • • • , n). Setting/(x, y) « g(x, y) - 
h(x, y) as before, we have ffa , yd =0 for i = 1, 2, • ■ ■ , w. Then if (a?*, yo) 



W. M, KINCAID 


is a point at which g and h are defined, we may ask whether there is any formula 
corresponding to (1) from which the error f(x Q , y 0 ) can be estimated. 

Dome restrictions must be placed upon the function / if any interesting results 
are to be obtained. Let us suppose that f(z, y) can be expanded in a Taylor 
series about each of the points fa, Vi )(i with a region of con- 

vergence sufficient to include all the points of the set These conditions are more 
stringent ones than will be required for obtaining the later results; on the other 
Hand, they would almost always be satisfied in any practical problem of inter¬ 
polation so it scarcely seems worthwhile to look for the weakest possible con¬ 
ditions at this point. 

The first case of real interest is n = 3. It follows from the general statement 
0t iaylors theorem with the remainder that 

o = f(Xi , Vi ) = f(xo , Vo) + (Xi - Xo) f x (x a , yo) + ( Vi - yjffa t y„) 

® ■*" ~ t Vi) + 2(xi - xo)(yi - y 0 )/*»({;, Vi) 

+ ( y . - 2/o)V w (i,, *)] (t = 1, 2, 3), 

where (£<, Vi) is a point on the line segment joining (x 0 , yo) and (*,•, y t ) for » = 

3. 

The eqnation (2) may be regarded as a set of three linear equations in the two 
quantities /,(*,, y 0 ) and/„(x 0 , y 0 ). The condition that these shall be consistent is 

I /(* o i Vo) + Ui Xi — xo U\ — yo 

® /(*o i Vo) + Ut xt — xo yt — yo — 0, 

f(*o » Vo) + U z Xi — x 0 y s — y 0 

where 

Ut = il(Xi - ’ vt) + 2(Xt - - vdfmfo . Vi) + (Vi - y^Mit.vi)] 

TC* ( “ ■" ,u “ *•*• 3 >« - »* ^ ■»’. <#« £ 

Ui Xi - Xo J /1 - yo 
Ut Xt - Xo yo - y 0 
(4) /(*o, yo) =- U * *•- *<> VjJZjto 

1 *i Si 
1 x t yo 

i y j 

This expression is analogous to (1), though far less simple and elegant in form. 

A similar treatment can evidently be used in all cases of the type » » w(w + *> 



ERROR IN INTERPOLATION 


87 


For example, for n 


(6) /(xo, y«) - - 


' 6 the equation corresponding to (4) is 
Vi Xi - x» yi - y, fa - xo ) 1 (xi - Xo)(yi — 2 /o) (j/i — Vo? 

Vi Xj - Xo 2/i - j/o (xi - Xo ) 4 (x* - XoKj/i - Vo) (j/» - V«D* 

v» X» - Xo Vi - 1/0 (.Xj - Xo? (x a - XoHv» - Vo) (v» “ Vo? 

Vi Xi - Xo 2/4 - 2/0 (Xo - Xo? (x 4 - Xo)(|/4 — Vo) (j /4 — 2 /o) : 

Ps Xo - 2/0 2 / 0 - Vo (xi - xo)’ (x t - Xo)(v» — y») (2/0 — Vo) 4 
Vo x 6 - Xp 2 /» - Vo (xo - Xo) 4 (x t - XqXvi — |/ t ) (2/0 - Vo)* 

1 Xi 2/j xi X12/1 2/1 
1 x* 2/2 *1 X12/1 2/1 

1 Xo 2/1 x 3 x,y, 2/3 

1 x 4 2/4 Xi x 4 2/4 24 

1 x, vl *5 x S 2 / 6 2 /s 
1 x« 2/8 x« x«v« Vo 


where 

V< = il(x. - Xo)*/«x(?i, iji) + 3(xi - Xo) 4 («/i - Vo)/*>»({<, V<) 

+3(x,- — 2/o)/iiw(£i, T/i) + (v» — Vo?/m(i»» V<)] (* = 1, 2, • * * > 6). 
(Equation ( 5 ) breaks down only if the six points (xi , 2 / 1 ) • • • (xo, Vo) lie on a 
single conic.) 

As an example of the general ease we may consider n = 4. We write 
f(X{ , y t ) = /(xo , 2/0) + (xi - Xo)/*(x 0 ,2/0) + (v< - Vo)/»(xo , Vo) 

+M(Xi - x«) 4 /„(f., 1?,) + 2 (x,- - x 0 )(2/i - Vo)/w(€<. V<) 

+ (v< — Vo) 4 /»(f«, >?••)] (» = 1) 2, 3, 4). 


Now, 

/»({< , V<) = /*x(Xo , 2/0) + (fi — Xo)/jxx(£( , Iji) + (>l< — Vo)/»»(f« , ’ll), 

where ({{, ijl) is a point on the line segment between (xo, 2 / 0 ) and ({ 4 , n<X 
Proceeding as before yields 


Wi 

X\ 

— Xo 

yi 

- 

yo 

(Xi - 

xo) 4 

W t 

Xi 

- Xo 

y* 

- 

yo 

(xt - 

-xo) 4 

w» 

Xi 

- Xo 

yt 

- 

yo 

(x$ - 

-xo) 4 

Wi 

Xa 

- Xo 

y* 

— 

yo 

(xtz 

-xo) 4 



1 

Xi 

yi 

A 





1 

X* 

yt 

A 





1 

x 9 

yt 

A 





1 

Xi 

yt 

A 




( 6 ) 


/(xo.,Vo) = -• 



88 


KAI-LAI CHUNG 


with 

^ r * ~ *o) (£i Xq )/x*»(f< , Vi) “f* (x* — Xo) 2 (Vi — 2/o)/**v(f< t Vi) 

+ 2(.T t — CTo )(y% 2/o)/*y($% > + (y% 2/o) /yy(& ) U*) ] • 

Corresponding formulas can be derived in this way for any value of n; in fact, 
several alternatives may be obtained in each case. In all cases the error/(x 0 , yo) 
is given in terms of the derivatives of g alone if a polynomial of a certain type is 
used for the interpolating function. For equation (4), the suitable polynomial 
would be h(x, y) =[a + bx + cry ; for ( 5 ) , h{x , y) = o + bx + cy + cte 1 + exy +fy *; 
for (6), h(x, y) = a + bx + cy + d.r~. If the interpolating function h(x, y) 
is not so chosen, the formulas remain valid, but derivatives of h will appear. 

The same procedure is applicable to functions of any number of independent 
variables. 


ON A LEMMA BY KOLMOGOROFF 

By Kai-Lai Chung 
Princeton University 

The following lemma was proved by Kolmogoroff [1]: 

If Ci, C 2 , * • * , e n are independent events and U an arbitrary event such that 
(W(X) denoting the probability of X and W e (X) the conditional probability of X 
under the hypothesis of e) 

W. k (U) u y W(fi\ +••• + ««) S 14 


Then 


W(U) £ iu\ 

This result seems of some interest in itself and may also have practical applica¬ 
tions, for it is easily seen that [2] in general if ei, e 2 , • • • , e n are arbitrary no 
information about IF, 1+ ...+« n (C7) can be obtained from that about W tk (U), 
k = 1, • • • , n. From this point of view the constant 1/9 is interesting, though 
it is unimportant in Kolmogoroff’s proof of the law of large numbers. Using his 
original method this constant can easily be improved to 1/8. However, the fol¬ 
lowing method will give a better result. At the same time we shall put it into 
a more general form. 

Let 


Z W {e k ) £ p. 

A-l 


W H (U) £ «, 



ON A LEMMA BY KOLMOGOROFF 


89 


Then we have for 1 £ n, 

(1) W(U) £ W(U(ei + • • • c k )) - W{Uei + • • • + Ue k ). 

Now a simple case of certain inequalities due to Bonferroni and Frechet [3] 
states that for arbitray events Ei , • • • , E k we have 

(2) W(Ei + • • • + B k ) £ Z W(E { ) - E 

*-l l$i<i£k 

Applying this to (1), we obtain 

W(U) ^ E W(Fed - E W(Ue,e,) 

*-l l£<</£Jfc 

£ Z W(ed\r H (U) - E 

»-l 1£<<;<* 

using the independence of e \, • • • , e k • Hence 

W(U) ^ E TI'(e<) - i^E ITW))* + ig 
By Cauchy’s inequality, 

Z TP*W ^r(i W(ei) Y. 

t-i 't V-i / 

Jb 

Writing E* = Z W(c t ), we have 

0 ) HW) a[«-(i-^)z.]z,. 

Now let 0 < 7 < 7 o < 1 where 7 and 70 are to be determined later. If there is 
an ei , 1 < i < n such that TF(e t ) ^ 70 , then 

(4) W(U) £ W(Ue t ) = W(ei)W u (U) £ 7 <*0. 

If every TF(c,) < 70, we determine k(> 1) such that 

Sft-i < 7 o 0 ^ 2 * ; 

thus 

2* < 2*_i + 7/3 < (70 + 7)0. 

And (3) yields 


( 5 ) 


W(U) £ [a - - !)(70 + 7 )^] 700. 



90 


KAI-LAI CHTJNG 


Now we choose y so that the last terms in (4) and (5) be equal. This gives 


p-i)*' 


To maximize y , we put -r^- = 0 and find 
ay o 

2(V2 - l)a 

7o j . 

If 2pv/2 — l)« & P, this choice of 70 is admissible, and we obtain 

2-vs-t-ftva-o 2(V5 _ 1} . 

V 2 - i (V5 - 1 ) e 

Thus we get (the first inequality being retained for small values of n) 

2- V2+- (V2 - 1 ) 

(0) W(U) ^ -—-- 2(V2 - 1)«* 

V2-UV2-1) 

71 

^ 2(V2 - 1)V > T^a. 

<n case 2pv/2 — l)a > P, we choose 70 = 1, and we obtain 


Thus we get 


H'-iy 

-Hh 

+P--‘> 


2a - p 
2a + p' 


If we write P == ija, we have 


^^2—>• 



APPROXIMATE WEIGHTS 


91 


We summarize (6) and (7) in the following table: 


fi/a 

£ 2(V2 - 1) 

- i? < 2\/2 - 1 

W(U) 

£ 2(V2 - 1)V 

2 ■— iy 2 
*2 + ,”“ 


Thus for KolmogorofFs case (n = 1) we have W{U) § $a*. 

REFERENCES 

[1] A. Kolmogoroff, “Bemerkungen zu meiner Arbeit ‘tJber die Summen zuf&lliger 

Grassea 1 ,” Math . Annalen, Vol. 102 (1929), pp. 434-488. 

[2] K. L. Chijng, “On mutually favorable events,” Annals of Math. Stat. f Vol. 13 (1942), 

pp. 338-349. 

[3] M. FbAchbt, Lea probabilities associate d un systbne d'tebnementa compatibles et depen¬ 

dents, Premidre partie, Hermann, Paris, 1939, p. 59. 


APPROXIMATE WEIGHTS 

By John W. Tukey 
Princeton University 

1. Summary. The greatest fractional increase in variance when a weighted 
mean is calculated with approximate weights is, quite closely, the square of the 
largest fractional error in an individual weight. The average increase will be 
about one-half this amount. 

The use of weights accurate to two significant figures, or even to the nearest 
number of the form: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 
32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, or 95, that is 
to say, of the form 10(1)20(2)50(5)100 X 10 r can thus reduce efficiency by at 
most \ percent, which is negligible in almost all applications. 

2. Proof. Let the optimum weights be Wt , i = 1, 2, • • • , n, with > 0, 
where it is convenient to choose the normalization 2,Wi = 1. Let a 2 be the 
variance of 2WiXi , then the variance of each Xi must be a 2 /W> , and since this 
is a weighted mean, the means of the Xi are the same. 

Let the approximate weights be W »(1 + X0,-), where 0 < X < 1 and | 0,-1 < 
1, i — 1, 2, • • • , n. Thus X is the largest fractional error which may be made 
in the situation considered. We need the weak requirement X < II The ap¬ 
proximately weighted mean is 

X) TT<(1 + X0j)jj _ y' w 1 + X0 t 

T, W<(1 + X0i) ^ * l + x9 ’ 



92 


JOHN W. TTJKEY 


where 9 — XWidi . Its variance is 


m 


= <r 2 { 1 + 1 - 


+ \e 


- 2 Z WM - 6) + 


.{ 1 + i ..<£jM>- 


<1 + XS)’ 

and, since - < 1, this is bounded by 


2 | 


(1 + X 5) 2 


Z Wi(6i - fl) j 


1 + X 2 


i - r 


(i + XS) 2 / ‘ 


Now the only maximum of this expression for | 6 | < 1 occurs when 6 = —X, 
and the bound becomes 



This proves the first statement in the summary. 

The greatest fractional change which occurs when a number is approximated 
by one of the form 10(1)20(2)50(5)100 X 10 r is 5/105, which occurs, for ex¬ 
ample, when 10.499999 • • • , is replaced by 10. The same estimate applies to 
an approximation to two significant figures. The variance is thus multiplied 
by a factor bounded by 


1 + 



1.0023, 


which proves the second statement. 

The use of a weight of the simpler form 10, 15, 20, 30, 40, 50, 70, times a 
power of ten is seen in the same way to lead to an increase in variance and a 
decrease in efficiency of at most 4} percent. 

3. Comment. It is interesting to compare the 90 possible values for 2 sig¬ 
nificant figures, the 35 possible values for the numbers proposed above, which 
might be called two curtailed significant figures , and the 24 possible values for 
logarithmic spacing at interval (1.05) 2 , all of which extend over one power of 
ten with the same maximum fractional error in rounding. The use of the cur¬ 
tailed scheme for critical tables of weights and weighting coefficients would save 
more than 60 percent of the entries needed for two complete significant figures. 

This device applies equally well to other numbers of significant figures. 



USE OP NON-CENTRAL (-DISTRIBUTION 


93 


ON THE USE OF THE NON-CENTRAL (-DISTRIBUTION FOR COM¬ 
PARING PERCENTAGE POINTS OF NORMAL POPULATIONS 

By John E. Walsh 
Princeton University 

1. Introduction. Consider two normal populations with the same variance 
and means y and v respectively. It is well known that confidence intervals and 
significance tests can be obtained for the difference y — v. Since y is the 60% 
point of the first population and v is the 50% point of the second population, 
this represents a particular solution of the general problem of obtaining confi¬ 
dence intervals and significance tests for the difference 8 a — <pp , where 8 a is 
the a percent point of the first population and <pp is the P percent point of the 
second population. The purpose of this note is to point out that the results of 
Johnson and Welch [1] for the non-central (-distribution can be used to furnish 
a solution of the general problem. 

2. Analysis. Let A y be the y percent point of the normal population with 
zero mean and unit variance (i.e. exactly 7 % of the population has values less 
than Ay). Then if a is the common standard deviation, 

0 a = y + A a a, <pp = v + A pa. 

Thus 

8 a —- <pp = {y — v) + ( A a — A p) or. 

The non-central (-distribution investigated by Johnson and Welch in [1] is 
based on the quantity 

< = (2 + 6)/VWf, 

where z has a normal distribution with zero mean and unit variance, 8 is a con¬ 
stant, and x has a x 2 -distribution with / degrees of freedom and is distributed 
independently of z. Methods and tables are given in [1] whereby a constant 
t(f, 5, c) can be computed having the property that 

Pr[t > t(Jy 6, e)] = e. 

These relations will be used to obtain confidence intervals for 0« — <p t j. The 
resulting confidence intervals can be used to obtain significance tests for 0« — <pp . 

Let Xi, • • • , x n be a random sample of size n from the first population while 
Vi i * * * 9 Vm is a random sample of size m from the second population. Then 
consider 




94 


JOHN £. WALSH 


This quantity has a non-central (-distribution with 

« = - A„) / y / T^Tl ) / = m + 

For notational simplicity let 




E (x f - ^) 2 - si , 


Then one-sided confidence intervals for 0 a — with confidence coefficient < 
are given by 


0* — <Pt < x — y — 


Oa — <p? > X — y — 


tt*Ws\ + Si 

v ( - m+n ~ 2 ) /a + iy 

t( i - «)V# + |1 

i/ (w+n - 2) / 6 + £) 


X — y — 


Two-sided confidence intervals for 9 a — <pp with confidence coefficient 

1 ~ («i + «s) 

are given by 

l/ (w + n " 2) /G + ^) 

<d - «i)Vg + g 

l/ (ra + n 2) /(i + i)’ 

where «i + < 1. 

REFERENCE 

[1] N. L. Johnson and B. L. Welch, “Applications of the non-central (-distribution’*, 
Biometrika , Vol. 31 (1940), pp. 362-389. 


<0a — W<x — y — 



THE TEACHING OF STATISTICS 


A report of the Jnstitute of Mathematical Statistics Committee on the 
Teaching of Statistics 1 

PREFATORY NOTE 

This report on the teaching of statistics contains two parts. Part I is a sum¬ 
mary of the conclusions reached by the committee concerning the appropriate 
content and organization of teaching in statistics. It is oriented towards the 
future, and is intended as a program for action. Part II, mainly the work of the 
chairman of the committee, is a more intensive discussion of the general problem. 
It surveys the present state of the teaching of statistics, probes some of the 
reasons for existing weaknesses in this teaching, and states more fully the basis 
for the conclusions summarized in Part I. 

Additional material, with special reference to applied statistics, is contained 
in a report of The Committee on Applied Mathematical Statistics of the National 
Research Council, entitled Personnel and Training Problems Created by the 
Recent Growth of Applied Statistics in the United States . 2 

PART I 

SUMMARY OF CONCLUSIONS 

1, Who are the prospective students of statistics? A complete teaching pro¬ 
gram in statistics must be designed to meet the needs of four principal categories 
of students, listed here according to the amount of training in statistics that is 
needed to meet their requirements. 

a. AU college students . Statistical method is a vital branch of scientific 
method. It is widely used in most sciences, business, government, and ordinary 
life. Some understanding of the nature of inductive inference from quantitative 
data on the basis of the theory of probability as portrayed in statistical method 
is an indispensable part of a liberal education. 

b. Future consumers of statistics . Some students will specialize in adminis¬ 
tration, business, or other subject-matter that will require them to understand 
the results of statistical analyses of special problems, although they themselves 
do not make these analyses. For example, business executives and government 
administrators must frequently base action on statistical studies. Research 
workers and teachers in many fields may not themselves use statistical methods, 
yet in order to keep abreast of their own or cognate fields they must read and 
understand studies using statistical methods. 

c. Future users of statistical methods . A still smaller group of students of 

x The Committee consists of Harold Hotelling, Chairman; Walter Bartky, W. Edwards 
Deming, Milton Friedman, and Paul Hoel. 

* Copies may be obtained from the National Research Council, 2101 Constitution Ave., 
Washington 25. 


95 



96 


THE TEACHING OF STATISTICS 


statistics are training themselves for careers of specialization in economics, pop¬ 
ulation, sociology, housing, business, business research, industrial design, indus¬ 
trial production, personnel, purchasing, public opinion, biology, agricultural 
science, metallurgy, physics, chemistry, psychology, or some other field that 
makes extensive use of statistics. Research in these fields often requires the 
use of advanced statistical techniques, and even the development of new statisti¬ 
cal theory. Students planning to do such research need statistical theory and 
methods as a tool. 

d. Future producers and teachers of statistical methods. The smallest, but in 
many respects most crucial group of students of statistics, are those who intend 
to specialize in statistical methods for the sake of statistical methodology. 
Many of these will become teachers or full-time research workers, though some 
will find posts in government and industry in high-grade statistical work, fre¬ 
quently requiring the development of new statistical theory and methods. 
These students will become tool-makers. 

2. What should they be taught? 

a. AU college students . 3 The fundamental logic and philosophy of statistics 
can be taught at an early stage. It is perhaps an appropriate subject to include 
in the kind of survey courses of physical or social sciences that have become so 
common in recent years. Three or four weeks of lectures and discussions should 
suffice to acquaint the students with the broad principles of inductive inference. 
No mathematics need be included, although some elementary experiments may 
well be performed to instil the concepts of sampling variation, randomness, and 
statistical predictability. The student even at this stage can be made to recog¬ 
nize the fundamentally statistical character of most decisions, arising from the 
fact that they involve an element of uncertainty and a balancing of the impor¬ 
tance of different types of errors. The student can be made to understand the 
fundamental difference between inductive and deductive statements, the nature 
of statistical estimation, and the nature of a statistical hypothesis. These 
concepts can be made concrete by illustrating them in terms of problems ranging 
from everyday questions such as whether to cross a street in the middle of the 
block on up to such vital problems as the construction of an appropriate social 
security plan, or the design of an efficient experiment for selecting the best variety 
of com, or the selection of the best method of testing for the presence of a disease. 

b. Future consumers of statistics. Future consumers of statistics need two 
kinds of training in statistics. First, they need some knowledge of the kind of 
statistical material available in their field of specialization; of the sources of 
such data; and of their limitations. To meet this need they require what may 
be called “descriptive statistics,” which places special emphasis on their own 
field of specialization. A one-quarter or one-semester course in some depart¬ 
ment or division (e.g., in the social sciences, or biological sciences) should meet 

* This recommendation is almost an exact parallel of one made by a committee on the 
teaching of statistics, appointed by the Royal Statistical Society and published by the 
Society in 1047 as a report to the Council; later published in the Journal of the Royal 
Statistical Society, vol. cx, Part I, 1947. 



THE TEACHING OF STATISTICS 


97 


this need. In addition, they need a reasonably thorough understanding of what 
statistics can and cannot do, what the major statistical techniques are, and how 
to interpret the results obtained by the application of such techniques. This 
need may be met for those students who have some mathematical background 
by all or part of the fundamental one-year course discussed in the next section. 
For students lacking this background, special courses along similar lines will 
be required. 

c. Future users of statistical methods . It is essential for fruitful application 
that users of statistical methods should not mechanically apply procedures 
learned by rote or taken from a manual. Since few research problems fit per¬ 
fectly into clearly defined patterns, nothing is so important to the successful 
collection and analysis of statistical data as adaptability and flexibility in using 
techniques. These require a thorough comprehension of the logical foundations 
of statistics, especially of the assumptions underlying its various technical 
devices, and sufficient knowledge of the derivations of these devices to be able to 
adapt them to the special circumstances that inevitably develop. To provide 
this background, a minimum of a full year fundamental course in statistical 
methods is essential, followed by courses of application. It is highly desirable 
that this fundamental course be based on calculus as a prerequisite, because with¬ 
out it a proper understanding of the development of statistical techniques cannot 
be attained. But this is probably impossible at present, in view of the unfor¬ 
tunately low level of mathematical training of most college students. As an 
expedient, and it is hoped a temporary expedient, it is recommended that the 
fundamental course be given in two sections, one requiring calculus, the other 
only a knowledge of first-year college algebra. A single course (or pair of courses, 
in line with the temporary expedient just mentioned) should suffice for all depart¬ 
ments , because the core of statistical methods is common to all fields of study. 
Given in this way, the fundamental course can have the advantage of being 
taught by the most competent statisticians in the institution. 

In addition to a thorough training in theory and methods, users of statistical 
methods need training in applications. This can be provided by courses in 
various applied fields. It is usually advisable that these courses be given in the 
department of application (agriculture, population, engineering, economics, 
psychology, etc.), and require the fundamental one-year course as a prerequisite. 

d. Future research workers and teachers of statistical method. The future 
research workers and teachers of statistical method clearly require far more 
intensive training in theory than has so far been suggested. A fundamental 
prerequisite to such training is knowledge of some advanced mathematics. It 
is difficult to specify exactly what or how much mathematics is necessary, but 
something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist. 

In addition to advanced mathematics and advanced work in statistical method, 
the future statistical theorist needs a good deal of work on applications, in the 
form either of experience or courses. He will be a tool-maker, and needs to 



98 


THE TEACHING OF STATISTICS 


know by personal experience something of the problems of those who use his 
tools. One satisfactory arrangement is an internship in statistical research, 
as is currently provided by some institutions. By this arrangement, interns 
work under competent leadership in various government or private agencies 
that are engaged in large-scale statistical studies. The interns do research in 
theory, adapt the physical circumstances to theory and vice versa, and have 
actual practice in the design of experiments, the construction of questionnaires, 
writing of instructions, planning tabulations, analyzing the results, and exam¬ 
ining sampling variances. 

It is obvious that proper advanced courses in statistics will for many years be 
the province of a few institutions only, as there does not exist at present an ade¬ 
quate professional body to man more than a few. 

3. Who should teach statistics? It is clear from the preceding section that 
two different kinds of courses are required to meet the needs of students of 
statistics: first, courses in statistical method and methodology; and second, 
courses in applications of statistical methods to particular fields. 

The most important requirement for a successful university program in statis¬ 
tics is that courses in statistical method and methodology should be taught by a 
statistical theorist, a man who has had the training outlined in Art. 2d above, 
is specializing in statistics, is doing research in statistical method, and who has 
had some first-hand acquaintance with applications of statistical techniques. 
This is the only way such courses can be kept abreast of developments and 
sufficiently broad to meet the needs of all departments. This recommendation 
may seem to belabor the obvious, but a glance at the qualifications of most 
people currently teaching statistical methods will show why it is necessary. 

Most courses in applications should be taught by people thoroughly conver¬ 
sant with the relevant subject-matter fields as well as statistical methodology. 
Some courses in applications may be taught by statistical theorists, particularly 
new applications or applications that are common to many fields. 

4. How should the teaching of statistics be organized ? The teaching program 
in statistics should be organized around a separate administrative unit, an Insti¬ 
tute or Department of Statistics. This department should be primarily respon¬ 
sible for the teaching of courses in statistical methods: the fundamental course 
in statistical method described above, specialized methods for particular fields 
of application (e.g., factor analysis, time-series analysis), and advanced courses 
in statistical theory. 

In addition, the department of statistics should offer its services as a consulting 
centre on problems in statistics arising in research in other departments of the 
institution, both as a service to these other departments and because research 
in statistical methods peculiarly requires stimulation from close communication 
with applications. The department of statistics might also provide laboratory 
facilities for itself and other departments, 4 and might undertake directly, or 

4 See the interesting suggestions on this point on p. 14 in Personnel and Training Prob¬ 
lems, loc. cit. 



THE TEACHING OF STATISTICS 


99 


through an associated research staff, special assignments involving the applica¬ 
tion of statistical methods to concrete problems. 

Intermediate courses dealing primarily with applications ordinarily belong in 
other departments (agriculture, economics, demography, engineering, biology, 
etc.), although some may be given in the department of statistics. The exact 
location of courses in application will depend on the accident of the depart¬ 
mental affiliation of the persons competent to teach them. Coordination of the 
teaching program in statistics can be achieved by an interdepartmental com¬ 
mittee. The department of statistics should not, however, consist of such a 
committee under a different name. It should be a thoroughly independent de¬ 
partment, with all or most of its members entirely in the department. 

The recommendation that the responsibility for teaching statistical methods be 
centered in a separate department is based on the belief that the teaching of 
statistical methods without theory can only be uninspiring and harmful; that a 
separate department of statistics offers the only arrangement that can assure 
statistical theory being taught by competent theorists, and the only satisfactory 
arrangement for ensuring the strong incentive for statistical research, with appro¬ 
priate recognition and advancement, which is as necessary for the teaching of 
statistics as for the teaching of any other subject. 

6. What should be done about adult education? The preceding recommenda¬ 
tions are all directed toward the teaching of statistics to undergraduate and 
graduate students. There is an additional need that these do not meet, namely, 
the provision of training to mature research workers in various fields already 
established in their professions. This need arises in part from the inadequate 
teaching of statistics in the past, but even more from the extremely rapid advance 
in the theory and practice of statistics which have made it difficult for any but 
the specialist to keep abreast of developments. Some institutions are making 
efforts to meet this need by providing evening and late-aftemoon classes for 
employed research workers. Such classes are feasible only in the larger centres 
of statistical activity. There is also the need of providing advanced research 
workers in particular fields with highly specialized guidance in selected topics. 
A department of statistics organized along the lines suggested above can con¬ 
tribute toward meeting this need by effective counseling of colleagues in other 
departments, and by organizing special seminars and lectures for them. The 
professional statistical associations are also contributing by arranging special 
expository programs. 



100 


THE TEACHING OF STATISTICS 


PART II 

THE PLACE OF STATISTICS IN THE UNIVERSITY 5 
Contents 

A. Minor nuisances and inefficiencies in statistical teaching 

6. Lack of coordination among departments. Lack of advanced courses and labora¬ 
tory facilities 

7. Inefficient decentralization of libraries 

B. The major evil: failure to recognize the statistical method as a science, requiring spe¬ 
cialists to teach it 

8. Too many teachers not specialists 

9. Results: students ill equipped 

10. Reasons why teachers of statistics are often not specialists 

a. The rapid growth of the subject 

b. Confusion between the statistical method and applied statistics 

c. Failure to recognize the need for continuing research 

d. The system of making appointments to teach statistics within particular 
departments that are devoted primarily to other subjects 

11. Appointments under the existing system are not all bad 

12. Unsatisfactory texts 

13. Omission of probability theory from texts and teaching 

C. Proper qualifications of teachers of statistics 

14. Statistics compared with other subjects 

15. Current research in the statistical method is essential for teachers 

16. Minimum requirements in mathematics for the training of teachers and research 
men in statistical theory 

D. Need for relating theory with applied statistics 

17. An example of the interaction between theory and practice 

18. Supplying opportunities for application in graduate studies of statistics 

E. Recommendations on the organization of statistical teaching and research in institu¬ 
tions of higher learning 

19. Research should be encouraged; teaching schedules should not be overloaded 

20. Organization of statistical service in the university 

21. Organization for teaching 

22. The statistical curriculum 

23. Statistical method as part of a liberal education 

A. MINOR NUISANCES AND INEFFICIENCIES IN STATISTICAL TEACHING 

6. Lack of coordination among departments. Lack of advanced courses and 
laboratory facilities. The teaching of statistics in American colleges and uni¬ 
versities, which has for the most part been a development since the first world 
war and has now reached large proportions, presents a number of unsatisfactory 
features. Courses in statistical methods are taught in various departments 
without coordination or inter-communication. These courses cover what is to 
a large extent the same material, but with many variations in the selection of 
subjects according to the ideas and abilities of individual instructors, and with 

* An earlier version of this part, prepared entirely by the chairman, is being published 
by the University of California Press in a report of a symposium on probability and sta¬ 
tistics. The Committee as a whole made and adopted the present condensation, with W. 
Edwards Deming and Milton Friedman contributing most of it. Publication of the 
Berkeley symposium, including the more detailed original, has been delayed, but it is 
expected to appear Boon. 



THE TEACHING OF STATISTICS 


101 


illustrative examples drawn in each case from material pertaining to the depart¬ 
ment in which the course is taught. Thus a student desiring to learn more about 
statistics than he can obtain in one department must, in taking courses in other 
departments, repeat a great deal of what he has previously covered. 

There is a plethora of elementary courses and a dearth of advanced ones. 
Some departments have excellent statistical laboratories which they reserve for 
the use of their own students, each with an attendant to keep others away, while 
other departments have none. Some classes in elementary statistics are too 
large and some too small, with no one in a position to equalize the sections be¬ 
tween different departments. 

7. Inefficient decentralization of libraries. The library situation is confused. 
Books on statistical methods are catalogued and shelved under Sociology, 
Economics, Business, Psychology, Zoology, Botany, Engineering, and Medicine. 
Books on probability are divided between Philosophy, Mathematics, Physics, 
and Chemistry. Books on the method of least squares are for the most part 
divided between Mathematics, Astronomy, and Civil Engineering, though some 
get into the Economics, Geology, and Physics reading-rooms. Works on the 
analysis of variance and design of experiments are likely to be concentrated under 
Agriculture, while methods of approximate evaluation of multiple integrals and 
similar purely mathematical subjects of use in statistics are, at least in one of our 
largest universities, to be found only in the library of Biology. 

B. THE MAJOR EVIL: FAILURE TO RECOGNIZE STATISTICAL METHOD AS A 
SCIENCE, REQUIRING SPECIALISTS TO TEACH IT 

8. Too many teachers not specialists. The above nuisances are but minor. 
The major evil is that those attempting to teach statistical method are all too 
often not specialists in the subject. Their original selection was seldom on the 
basis of scholarship in this field; they are not encouraged to make advanced 
studies in it; and their environment is such as to draw their attention in every 
direction except to the central truths and problems of their science. Frequently 
they lack the knowledge of mathematics necessary to begin to read the more 
serious literature of the subject that they are teaching. Many have been utterly 
unable to keep up with the rapid progress which has been taking place in statisti¬ 
cal methods and theory, progress which affects even the most elementary things to 
he taught 

9. Results: students ill equipped. There results a widespread teaching of 
wrong theories and inefficient methods. Students are sent to the government 
service and to industrial and commercial statistical positions equipped with the 
skill that results from careful drilling in methods that ought never to be used. 
Some of these same students are encouraged and assisted to become college and 
university teachers of statistics without ever making thorough-going studies of 
the fundamentals of the subject, or exhibiting any power of making original con¬ 
tributions to it, or studying any graduate mathematics. Through the method of 
selection of teachers in general use, and through textbooks written by individ¬ 
uals of t his type, there is a perpetuation of obsolete ideas and unsound methods. 

All this does not mean that any considerable number of people teaching statis- 



102 


THE TEACHING OP STATISTICS 


tics are unworthy or objectionable members of the academic community. Many, 
indeed, are of superior intellect, upright character, personal charm, and un¬ 
doubted teaching ability. Some are making creative contributions to other sub¬ 
jects. The only trouble is that they are teaching a subject in which they are not 
specialists, and which progresses so fast that only specialists can keep up with it. 

10. Reasons why teachers of statistics are often not specialists. The chief 
reasons for the extensive teaching of statistical method by people who are not 
specialists in it appear to be the following: 

a. The rapid growth of the subject and multiplication of its applications, creating 
a very large and very urgent demand for teaching it that could not be met im¬ 
mediately by the small existing number of scholars specializing in statistical 
method. This difficulty is aggravated by the paucity of university facilities 
for training advanced scholars in the field, so that even now the available number 
of such scholars cannot be expanded with sufficient rapidity to meet the current 
need. As specialists have not been available in anything like sufficient numbers, 
statistical method has inevitably been taught largely by non-specialists. 

b. Confusion between statistical method and applied statistics. Statistical 
method is a coherent, unified science. ‘‘Applied statistics” may mean any of 
thousands of diverse things. Any particular study in applied statistics will 
ordinarily utilize some few of the results obtained by the science of statistical 
method, but will be largely concerned with matters peculiar to the particular 
application in view and others closely related to it. For example, studies of 
business cycles utilize statistical methods, good or bad, with a view to drawing 
inferences from existing data on prices, production, incomes, interest rates, bank 
reserves and the like. The main job of the applied statistician in this field is to 
study the sources and nature of the various series of observations, keeping in 
mind incidental events which may break the continuity of a series, and watching, 
with a background of economic theory and knowledge of the facts, for explana¬ 
tions. He should also be well acquainted with statistical theory, since other¬ 
wise there is grave danger of wasting or misinterpreting the laboriously accumu¬ 
lated observations. Indeed, an organization studying business cycles, or solar 
cycles, or rat psychology or cancer or practically anything else, would almost 
certainly benefit from participation by a specialist in statistical method. 

However, the chief attention in any such study will not be on statistical method 
but on features peculiar to its own scope. The specialist in statistical method 
will do well to participate occasionally in such a study, but if he does so too ex¬ 
tensively the needs of the application will so engross his attention that he cannot 
keep up with the progress of statistical method itself. 

The call of applications is enticing, and has led many young scholars to forsake 
the cultivation of statistical theory. The applications have benefited greatly 
by the process. Moreover, problems brought back in this way from applica¬ 
tions have provided valuable inspiration in developing theory. The mistake 
lies in supposing that participation in applied statistics is equivalent to specializa¬ 
tion in statistical method and theory, and the consequent appointment to teach 
the latter of persons whose sole concern is with the former. 



THE TEACHING OF STATISTICS 


103 


c. Failure to recognize the need for continuing research in the theory of statistics 
by those who teach it. There is an easy tendency to assume that all the requisite 
ideas and formulae can be found in some book, and that the duty of the teacher 
of statistics is simply to transfer this established book-knowledge to the minds 
of the students and impart to them skill in applying it. Similar attitudes ap¬ 
plied to other subjects have in the past been a drag on progress, and have long 
been discarded in respectable universities. They still hang on, however, even 
in the best institutions with respect to statistics. The spectacular advances of 
the last three decades in statistics should make it clear to anyone who has followed 
them that statistical method is far from static, that the best techniques of present- 
day statistics may tomorrow be replaced by something better, and that un¬ 
solved problems regarding the theory and methods of statistics are sticking out 
in every direction. A vast amount of research, mostly of a highly mathematical 
character, is needed and is in prospect. Anyone who does not keep in active 
touch with this research will after a short time not be a suitable teacher of statis¬ 
tics. Unfortunately, too many people like to do their statistical work as they 
say their prayers—merely substitute in a formula found in a highly respected 
book written a long time ago. 

d. The system of making appointments to teach statistics within particular depart - 
meats that are devoted primarily to other subjects. In effect, the teacher of statisti¬ 
cal method is too often selected by economists or sociologists or engineers or 
psychologists or medical men because he is to teach in one of these departments. 
Thus the task of selection devolves upon people unacquainted with the subject, 
though realizing the need for it in connection with a very specific application. 
Under such conditions there is an inevitable tendency to emphasize the immed¬ 
iately practical and specific at the expense of the fundamental work of wider 
applicability and greater long-run importance. Confusion between a science and 
its applications is most pronounced with those who know little about it, and the 
distinction between statistical method and applied statistics is likely to be com¬ 
pletely lost when a sociologist or an engineer is confronted with the problem of 
finding someone to teach statistics. If he does make the distinction at all he is 
likely to choose in favor of applied statistics. 

Strangely, the actual teaching that ensues is bound to consist largely of sta¬ 
tistical theory, because the students will ordinarily not have had statistical theory 
elsewhere, and they must have some in order to apply it. What often happens is 
that a sociologist or an engineer who has made some study of statistics embarks 
on what he thinks will be a career of teaching the application of statistical method 
to sociological or engineering problems, only to discover that because of the 
ignorance of the students he is compelled to teach the fundamentals of statistics, 
an entirely different subject for which he lacks preparation, talent, and interest. 

An incident of this sort has been cited previously. 6 A prominent economist 
was asked to teach a course entitled ‘Trice forecasting” in a leading university, 
and accepted. He found, however, that his lectures on this subject were over 

• Harold Hotelling, “The teaching of statistics ” Annals of Math . Stat., vol. xi, 1940, 
pp. 467-470. 



104 


THE TEACHING OF STATISTICS 


the heads of the students because he was using statistical concepts unfamiliar to 
them. He therefore went back over the ground covered so as to explain these 
particular statistical concepts along with their application. But in explaining 
them he found himself using other statistical concepts, which in turn called for 
explanation. At the end of the semester he found that he had not given the 
course in price forecasting which he had planned, and for which the large class had 
enrolled, but instead had taught a somewhat disordered course in elementary 
statistics, a subject in which he did not feel particularly competent, and for which 
the students had not come. When he was asked to teach price forecasting a year 
later he proposed that a prerequisite of a course in statistics be imposed, but this 
proposal was rejected by the chairman of the department, and the course was not 
repeated. 

11. Appointments under the existing system are not all bad. More by acci¬ 
dent than by design in the existing system, not all statistical appointments by 
departments of application are bad. Some professors in these departments make 
conscientious excursions into statistical theory, are well advised by competent 
specialists in statistics, and bring about the appointment of men of high quality 
well acquainted with statistical method and theory of the currently best sort. 
This may work out well if the man so appointed is an able and energetic scholar 
deeply devoted to his subject, if he is placed immediately in the highest pro¬ 
fessorial rank, and if he does not feel under obligation to devote himself too ex¬ 
clusively to the special interests of the department of which he finds himself a 
member. He is then free to pursue his specialty, to keep informed on the latest 
developments in statistical method and himself to add to the subject, while at 
the same time transmitting to students a well rounded and up-to-date selection 
of knowledge. It is in this way that some of the present leaders in statistics have 
developed. It is a wrong procedure, however, to depend on accidents of this 
kind. 

The system of departmental organization and of making appointments and 
recognizing proficiency in the teaching of statistics needs to be altered. The 
usual story is typified by the appointment of a promising young scholar in sta¬ 
tistical method to a junior position in some department of application where he 
is expected to work*on problems and to teach statistical methods with a sole eye 
to the work of the specific department. He is then under pressure to concentrate 
on a particular kind of applied statistics, for his advancement will depend, not 
on his statistical attainments at all, but on his study of the literature, termin¬ 
ology, techniques and theories of the application. His usual associates will be 
in the department in which he is teaching rather than others teaching statistics. 
The loss, although not total, is great, because the opportunity to make the most 
of the man’s statistical ability is lost, and his ability as an economist, agricultural 
scientist, engineer, or something else that te is not particularly fitted for, is 
substituted. 

A still less favorable circumstance, and unfortunately more common, is that in 
which the teacher of statistics is not even selected for scholarship in the theory 
of statistics. Studies in some other field, with some slight dabbling in the appli- 



THE TEACHING OP STATISTICS 


105 


cation of statistical methods to it, plus a pleasing personality, have all too fre¬ 
quently been thought to comprise sufficient qualifications for teaching statistical 
methods and theory. 

12. Unsatisfactory texts. The uncritical character of the teaching is reflected 
in the long line of textbooks written by teachers who have not made any gen¬ 
uinely fundamental study of statistics, but pass on to students in a magisterial 
fashion what was passed on to them. Authority takes the place of derivations 
and ultimate sources. It is no wonder that these textbooks, copied from each 
other, contain increasing accumulations of errors; or that long delays have inter¬ 
vened between the introduction of important new statistical methods and theories 
in the periodical literature and their appearance in the textbooks and courses 
put before students. 

The latest discoveries in the theory of statistics affect what should be taught 
in elementary courses, and no syllabus can be expected to survive more than a 
few years of research. The development of new statistical methods and ideas 
of overwhelming importance must be allowed to compete with material already 
well established as true and useful. The new material is equally true and in 
some cases even more useful than matter usually incoiporated in the best of 
current courses and textbooks. 

13. Omission of probability theory from texts and teaching. One of the im¬ 
portant weaknesses in much of the current teaching of statistics is a failure to 
make proper use of the theory of probability. Without probability theory, sta¬ 
tistical methods are of only minor value, for although they may put data into 
forms from which intuitive inferences are easy, such inferences are very likely to 
be incorrect. The objective weighing of the degree of confidence to be placed in 
inductive conclusions is necessary to avoid fallacies. Indeed, the whole founda¬ 
tion of descriptive statistical methods, of inductive inference, and of the design 
of experiments, rests upon probability theory. 

The relevance of probability to much statistical work was indeed questioned a 
quarter-century ago by a group of economists impressed by the lack of independ¬ 
ence between consecutive observations, and this attitude, in conjunction with 
an exaggerated and belated remnant of nineteenth-century empiricism, has had 
a certain influence, particularly on the statistical methods in use by economists. 
This view is now rapidly giving way to a tendency to use the powerful new sta¬ 
tistical methods discovered in the meantime. It is now perceived that efficient 
objective methods can be used over a much wider range of cases than was formerly 
supposed, because the independence assumed in their derivations refers not to 
observations but to residuals from the theoretical model used. Furthermore, 
research is under way, and has already achieved promising results, on the exten¬ 
sion of accurate methods to still more extensive classes of problems. 

C. PROPER QUALIFICATIONS OF TEACHERS OF STATISTICS 

14* Statistics compared with other subjects. The qualifications appropriate 
for teachers of statistical method and theory are not essentially different in degree 
from those for teachers of other subjects in the same institutions; proficiency in 



106 


THE TEACHING OF STATISTICS 


statistical method and theory is merely to be substituted for it in other subjects. 
This substitution is, however, vital. It must not be imagined that proficiency 
in some other subject in which statistical methods are used incidentally is equiv¬ 
alent to proficiency in statistical method itself. The error of such a supposition, 
if carried over into another field, might lead to the appointment of a man as pro¬ 
fessor of chemistry on the ground that he could cook. 

The first requisite of the college or university professor of any subject is a pro¬ 
found and thorough knowledge of that subject. It is customary in the better 
institutions at least to restrict appointments to the rank of assistant professor to 
persons who have demonstrated scholarly qualifications by work equivalent to 
that leading to a Ph.D. degree, including an original contribution to the subject 
that the individual is to teach. Promotion to the higher ranks is conditioned 
upon a number of criteria, among which published research is by far the most 
important in those institutions. 

15. Current research in statistical method is essential for teachers. Research 
is even more essential in the teacher of statistics than in teachers of most other 
subjects, because so much remains to be worked out that is of immediate impor¬ 
tance. Some college teachers do no research. This is usually regarded as de¬ 
plorable. The evil is, however, of quite different magnitude according to the 
nature of what is taught by such teachers. In a new subject in which sharp 
differences of opinion exist or have recently existed on fundamental questions, 
in which current discoveries have an important bearing, and in which there have 
not yet been the time and consensus necessary for the preparation of an adequate 
and virtually error-free textbook, teaching without research may have calamitous 
effects. The effective teacher must, of course, have teaching ability, but no 
skill in pedagogy, no lustre of personality, can atone for teaching errors instead 
of truth. Errors are very likely to be taught by those who do no research, and 
then the more skillful the pedagogic indoctrination, the greater the harm. 
Sound educational policy calls for devotion to research of a large fraction of the 
time and energies of the teaching staff in a subject like statistical theory. Stu¬ 
dents also are in particular need of encouragement to do original and critical 
work in relatively new areas of this kind. They must be taught to shun the 
use of formulae and methods given merely on authority without full and con¬ 
vincing reasons, and to insist on looking closely and critically at assertions. 

Even in the teaching of elementary statistical methods for direct practical 
use by specific occupational groups, where it might be thought that the teaching 
would most predominate over the research element, the teacher must face diffi¬ 
cult questions whose answers call for research in statistical theory. Let us 
illustrate this by one example out of the many possible. In teaching the analysis 
of variance for use in agricultural experimentation, questions arising out of the 
possible non-normality of the underlying distributions must be dealt with in 
some way. The formulae, even those in the best textbooks, are accurate only 
if the distribution is normal, and neither this fact nor the non-normality of many 
distributions should be concealed from the students. Obviously something more 



THE TEACHING OP STATISTICS 


107 


needs to be said on the subject at this point. What the teacher can say depends 
on how deep he has gone into a whole series of perplexing questions, on some of 
which the views of scholars are not yet stabilized, and on which a tremendous 
amount of research is needed before the maximum practical value can be attained 
for a technique whose usefulness is already amazing. 

16. M inim u m requirements in mathematics for the training of teachers 
and research men in statistical theory. Because research in the theory of statis¬ 
tics requires advanced mathematics, and is indeed largely mathematical in 
character, a mastery of a substantial amount of higher mathematics must be an 
essential part of the training of prospective teachers of statistics. To specify 
exactly what or how much mathematics is necessary would be a difficult task. 
Something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist, the in¬ 
ventor of new statistical methods. On the other hand, the time of the graduate 
student in statistics is much occupied with the theory of statistics itself; and some 
of his time should also go into the study of applied statistics. If the students 
entering a graduate school for advanced work in statistics went there equipped 
with a knowledge of matrix algebra and theory of functions and some additional 
higher mathematics, as is obtainable by undergraduates at some institutions, 
they would have time for applied statistics and could do some real work on 
applications. 

There is a cruel dilemma here, resulting from the delay in learning mathematics 
imposed by the elementary curricula which have become customary in this coun¬ 
try. The weakness of the mathematical element in the prevailing curricula 
affects both teachers and students of statistics to an extent justifying some atten¬ 
tion from those interested in the improvement of statistics. In American uni¬ 
versities elementary calculus is not often taught before the sophomore year, and 
the more advanced parts of algebra come still later, if at all. 

If calculus could be pushed down into the high schools and assumed as a pre¬ 
requisite for college courses in mathematics, statistics, economics, physics and 
several other subjects, the efficiency of instruction in all these departments could 
be increased. For example the difficulties experienced by students of economics 
with ideas of marginal cost, marginal revenue and the like correspond closely 
with the difficulties experienced by mathematicians for centuries in trying to 
define infinitesimals and derivatives, but now successfully overcome. The 
student who really knows differential calculus need not have the slightest diffi¬ 
culty with the marginal ideas of economics. Similarly in physics, the funda¬ 
mental concepts of speed, acceleration, potential theory, conductivity, thermal 
capacity and radiation, are all mathematical and easier to grasp once and for 
all as such than to be learned afresh with each new application from textbooks 
in physics sometimes not clearly written and taught by teachers who must for 
one reason or another avoid a mathematical approach. 

The possibilities of teaching quite advanced mathematics to young children 



108 


THE TEACHING OP STATISTICS 


have scarcely begun to be explored. Children of kindergarten age are fascinated 
and thrilled by the wonders of topology. Groups and number theory can be 
tremendous sensations in the fifth grade, though all these subjects are ordinarily 
reserved for graduate students specializing in mathematics. What is lacking 
is teachers who know mathematics and its applications and who possess enough 
freedom to teach what they know instead of the long, dull and relatively useless 
drill on problems of wallpaper hanging and the like, problems turning on mere 
conventions which are quickly forgotten, painful repetitious work which makes 
children resolve to quit mathematics as soon as possible. 

D. NEED FOR RELATING THEORY WITH APPLIED STATISTICS 

17. An example of the interaction between theory and practice. A professor 
of psychology working with mental tests might enlist the assistance of a young 
statistical theorist with mutual benefit. The young man might for a short time 
do some of the drudgery of scoring tests and computing, passing on soon to the 
problems of test construction and the distribution of various functions of cor¬ 
relation coefficients. This last is on a new and exciting frontier of statistical 
theory. The advancement of this frontier, which is really the main business of 
the young man in his capacity as prospective statistical theorist, would in this 
way come to him naturally as a problem or series of problems having a tangible 
meaning additional to its mathematical content. The empirical context is in 
such cases often of great value in suggesting suitable approaches, for example, 
suitable approximations in the study of functions not susceptible to simple 
mathematical representation in terms of elementary functions. 

If the young theorist succeeds in extending the boundaries of multivariate 
statistical analysis by discovering the distribution of some new function of cor¬ 
relation coefficients, the chances are that this discovery will also have applica¬ 
tions in anthropology, medicine, banking, and other pursuits which in the aggre¬ 
gate will greatly outweigh the application originally in view. 

The discovery should be regarded primarily as a contribution to the general 
theory of statistics, and published in a journal devoted to mathematical statistics. 
It will then become available to a wide circle of teachers of statistics, who may 
incorporate it into their courses, and its methods and results will be studied by 
other investigators from the standpoint of possible generalizations and analogs. 
The importance of the discovery would be much more limited if it were thought 
of as a development in psychology and published only in a psychological journal. 
Perhaps dual or multiple publication ought to be permitted in such cases, but the 
first publication should be in a journal of mathematical statistics. Far too many 
good statistical ideas have been buried in connexion with obscure special applica¬ 
tions. 

18. Supplying opportunities for applications in graduate studies of statistics. 

The statistician who does any work in applications must know statistics as an 
art as well as a science. The theoretical statistician, if he wishes to be of the 
utmost use to his colleagues in other disciplines, needs to know by personal 



THE TEACHING OF STATISTICS 


109 


experience something of their lives and collateral problems. Indeed, experience 
with applications, and the challenge of problems arising out of applications, have 
played a most important part in the development of statistical theory. It 
follows that the graduate student in statistics needs contact with applied statis¬ 
tics which the institution should undertake to provide, or at least facilitate. 
This need is next in importance after the needs for theoretical statistics and for 
pure mathematics. The distribution of time among the three—theoretical 
statistics, mathematics, and applied statistics—is hard to specify exactly, and 
must in any case depend on the nature of the student’s previous work. If his 
mathematical preparation has been full and rich, more time should be spent on 
applied statistics in his graduate years than if he has already had substantial 
contact with applied statistics in some other way but is deficient in higher 
mathematics. 

Applied statistics entails a somewhat detailed acquaintance with the field of 
application. Such a field might be life insurance, or mental testing, or industrial 
quality control, or sampling in the work of the Bureau of the Census or some 
other government agency; it might be agricultural economics, or business cycles. 
Proficiency in any such field calls for rather prolonged study, and it would be too 
much to expect the embryo statistical theorist to reach this stage of advance¬ 
ment in all subjects. He should, however, make more than a superficial study 
of some chosen field of application. This study might or might not be at the 
university. The requisite familiarity with applied statistics might in some cases 
be acquired by work in a government bureau, or in a research organization study¬ 
ing business cycles or something else involving applied statistics. What is most 
desirable is that the work should have brought the student to the point both of 
applying statistical methods in a reasonably effective way, and of perceiving the 
limitations of existing statistical methods. Perception of existing limitations 
has frequently been the germ of progress in the subject. 

One satisfactory arrangement is an internship in statistical research, as is 
currently provided by some institutions. By this arrangement, interns work 
under competent leadership in various government or private agencies that are 
engaged in large-scale statistical studies. The interns do research in theory, 
adapt the physical circumstances to theory and vice versa, and have actual 
practice in the design of experiments, construction of questionnaires, writing 
of instructions and tabulation plans, analysis of the results and appraisal of 
sampling variances. 

E. RECOMMENDATIONS ON THE ORGANIZATION Oi STATISTICAL TEACHING 
AND RESEARCH IN INSTITUTIONS OF HIGHER LEARNING 

19. Research should be encouraged ; teaching schedules should not be over¬ 
loaded. Colleges and universities usually expect the members of their faculties 
to engage in research as well as in teaching, the relative emphasis on these two 
functions varying greatly from institution to institution and to a lesser extent 
among departments within the same institution. Reasons why teachers of 



110 


THE TEACHING OF STATISTICS 


statistics must do current research in order to teach the subject have already 
been given in Art. 15. In the organization of statistical teaching it is thus of 
extraordinary importance that colleges and universities emphasize research in 
the theory of statistics as a leading part of the work of the teaching staff in this 
field. Hours of teaching and other duties must be kept within such bounds as 
to make research possible, the initial selection of teachers must be of persons 
capable of research in statistics, and there must be provision of needed secretarial, 
computational and other assistance. The library must be adequate, not only in 
publications containing statistical theory, but in the larger field of pure mathemat¬ 
ics as well. 

20. Organizing statistical service in the university. In addition to the cus¬ 
tomary duties of teaching and research, faculty members expert in statistical 
methods find that they cannot escape a third, viz., advice to their colleagues and 
others regarding the statistical aspects of their problems. This often takes a 
good deal of time. Clearly it is in the interest of the academic enterprise that 
such services be provided. Scholars in many departments are finding that their 
work is greatly improved by competent statistical advice not only in the inter¬ 
pretation of their data but also in the design of their experiments and other 
investigations. The provision of competent advice frequently requires extended 
consideration of the general content of the problem as well as special analysis of 
its statistical features. And initial advice often needs to be supplemented by 
further service. The statistician, like the physician, often finds that one inter¬ 
view at which a prescription is dispensed does not end the matter satisfactorily. 

Teaching hours must be distinctly limited if statisticians are to be able to 
render this service to the rest of the institution as well as maintain a high level 
of research in their own field. 

One way to handle the problem of statistical service, especially in a large 
institution, is through a special organization devoted to this purpose. Such an 
organization, whether called a Statistical Institute, a Department of Applied 
Statistics, Statistical Laboratory, or something else, might supply not only 
advice but a more active kind of assistance, including computational and chart¬ 
drawing services. 

A statistical service organization should be removed from the teaching of statis¬ 
tics only to the extent necessary to gain the advantages of some degree of special¬ 
ization and to prevent undue interruption of the teacher’s other work of teaching 
and of research in theory. There are distinct advantages for all parties in a 
fairly close connexion between practical statistical work, research in statistical 
theory, and statistical teaching. Each of these activities benefits the others, 
provided only that it does not take away from it too much time. Research in 
statistical theory, like medical research, needs frequent revitalizing injections of 
specific practical problems. It also needs the stimulus of contact with students. 
Hie teaching of statistical method is made more vigorous both by research in 
the subject and by the presence of applications with which students can be con¬ 
fronted. And the needs of applications are better met if through an organiza- 



THE TEACHING OP STATISTICS 


111 


tion such as is here envisaged they can be brought to the attention of appropriate 
specialists, and if also students can be enlisted when needed for their treatment. 

A university organization dealing with statistics may properly comprise two 
parts with overlapping personnel, one devoted chiefly to applied statistics, the 
other to theoretical statistics. The teaching might be done by both, but at 
least at the more advanced levels would be primarily the concern of the theoreti¬ 
cal part. Migration between the two ought to be easy and frequent, though some 
individuals are so definitely adapted to one kind of work or the other as to make 
it undesirable to have fixed rules calling for periodic transfers. 

In smaller institutions it may not be practicable to have statistical organiza¬ 
tions sufficiently well staffed to provide adequate consulting service. To meet 
the needs in some of these cases regional centres for advice and service in applied 
statistics might be established at large universities throughout the country, 
with access made readily available for sister institutions. These centres might 
also carry on work in applied statistics in behalf of government agencies and other 
organizations, much as various agricultural colleges have for years been carrying 
on cooperative work with the federal Department of Agriculture. 

The question how far, if at all, such a university centre of applied statistics 
•should go into the market place and engage commercially in service to business 
concerns is a debatable one. While there may be favorable reactions upon 
scientific work, there are grave dangers to the intellectual integrity of the in¬ 
stitution which need serious consideration. 

21. Organization for teaching. Passing from questions of personnel and the 
research and service functions of academic statisticians to teaching itself, we have 
to consider problems of departmental organization, of course contents, of systems 
of prerequisites, and of methods of teaching. All these we consider secondary 
problems, not in the sense of being unimportant, but because we believe that 
proper solutions of them will be reached with reasonable promptness when 
personnel of the kind described in Sec. C of this report are at work in some such 
general setting as has just been described. The ideas recorded below are general 
in character and are to be regarded as a starting-point for developing a program 
in a particular institution, once suitable faculty members have been obtained. 

The teaching of statistics may be organized in any of the following ways: 

a. In a department of theory and a department of applied statistics, both 
forming an Institute of Statistics. 

b. In a single Department of Statistics. 

c. Under an inter-departmental committee. 

d. Under the exclusive jurisdiction of the Department of Mathematics. 

e. It may be scattered among heterogeneous departments of application, 
without formal coordination. 

Only a few laige institutions will be in position to adopt the first plan. It is 
likely that the second will be most suitable for the majority. The third should 
probably be regarded as a makeshift for the transitional period until a proper 
department of statistics can be organized, a step that will not at the moment be 



112 


THE TEACHING OF STATISTICS 


reasonably possible for most institutions because the right kind of scholarly 
personnel does not exist in adequate numbers. It is of course possible that some 
vestige of an inter departmental committee, perhaps in the form of an Advisory 
Board, might be a useful adjunct of a department of statistics in order to keep it 
informed of the needs of applications. It is also possible that something of the 
sort might function with respect to a department of mathematics, or any other 
department. On the other hand, the desired consultations and adjustments 
might be accomplished in less formal ways. 

To make statistics a subdivision of a mathematics department is a solution 
that will appeal to administrators desirous of keeping down the number of de¬ 
partments. The subject-matter of statistics is to a sufficient extent mathemati¬ 
cal to give some apparent weight to this plan, and some mathematicians have the 
unsound idea that any mathematician can teach statistics without specialized 
study or experience in application. On the other hand, statistics has some 
features uncongenial to traditional mathematics, arising partly from the urgency 
of practical needs which go beyond what can immediately be provided by rigor¬ 
ous mathematical theory. Again we may cite the problem in the teaching of the 
analysis of variance of what to do about possible non-normality of the underlying 
distribution (Art. 15). The user of this technique has the responsibility of 
verifying that the situation conforms to the assumptiohs, including that of nor¬ 
mality, underlying the tabulated probability criteria. But he is in a very poor 
position to do this in a large proportion of the applications actually made of the 
analysis of variance. Yet the analysis of variance in some form—possibly 
through the use of rank-order numbers or through a transformation or some other 
auxiliary device—remains the one powerful means of attacking a very large and 
important class of practical situations. The practicing statistician needs to do 
some highly educated guessing on such matters—guessing that will be assisted 
but not made determinate by knowledge of a considerable range of mathematical 
truths regarding approaches to the normal distribution, moments of the variance- 
ratio in samples from non-normal populations, asymptotic large-sample theory, 
and other such topics. His mathematical insight needs to be supplemented by 
consideration of the particular subject-matter of application. Moreover, it is 
desirable that students of statistics have some practice with actual empirical 
data designed to develop the art of guessing in such ways. 

Another example of non-rigorous mathematics used extensively in statistics is 
the whole business of asymptotic standard errors found by the differential 
method. It is desirable that good mathematics replace bad in such connexions, 
but something is to be said for the position into which so many practical statis¬ 
ticians have been driven, that even bad mathematics may be better than none 
at all. The requisite good mathematics along these lines can come only through 
those who have made really serious studies of statistics, though a sufficiently 
interested pure mathematician might eventually be led by such a student of 
statistics to undertake and complete the necessary research. Practical needs 
make approximations necessary; the goodness of a particular approximation can 



THE TEACHING OP STATISTICS 


11& 


often be judged adequately by a statistician familiar with the particular applica¬ 
tion long before the heavy artillery of advanced mathematical analysis can be 
brought up. 

The teacher of statistics must have a genuine sympathy and understanding for 
applications, and these are not possessed by many pure mathematicians, at least 
in the opinion of some of those concerned with the applications; and it is this 
opinion rather than the possible fact that is of interest at the moment. For so* 
long as such an opinion is maintained, for example by psychologists and econ¬ 
omists, these specialists will be suspicious that courses in statistics given by a 
department consisting largely of pure mathematicians is unsuitable for their 
purposes. The result is likely to be a sabotaging of attempts at centralization, 
the different departments reverting to the old and ultimately objectionable 
system of teaching their own separate courses in statistical methods. 

These difficulties are not necessarily insuperable, and it is to be expected that 
many medium-sized and small institutions will make their mathematical depart¬ 
ments responsible for statistical teaching. But this ought not be be done without 
a consideration of the possible dangers. 

22. The statistical curriculum. We next consider curricular problems. These 
may be divided into those of the graduate school and those of the undergraduate 
college. Those of the graduate school may in turn be divided into those of 
specialization in statistics and of auxiliary teaching of statistics to students in 
other departments, such as sociology, who need to use statistical methods, have 
not studied them sufficiently as undergraduates, and cannot afford to put much 
time on them. Of these two subdivisions the number of students at present is 
greater in the second and the ultimate importance is greater in the first, because 
the whole future of statistics depends on improvement and enlargement of this 
graduate teaching. 

The incidental teaching of elementary statistical methods to graduate students 
in other subjects, without any prerequisite in mathematics or statistics, cannot 
equip these students with a command of the subject at all comparable to that 
which could be obtained by a better integration of undergraduate with graduate 
work. A prospective sociologist, economist, psychologist, or physicist ought to 
study elementary statistical methods and concepts while still an undergraduate* 
and without special reference to his ultimate field of specialization. 

The features of statistical methods peculiar in their applications, beyond what 
is taught through illustrations and exercises in an elementary course, may be 
fit material for a course, graduate or undergraduate, in a department of the 
application. Such a course should require as a prerequisite an elementary course 
in a department of statistics, or at least one taught by specialists in statistical 
method and theory. 

For the undergraduate college, in place of the sporadic offerings now current in 
different departments, we recommend a combination of two general fundamental 
courses with a number of advanced courses. Of the latter some will be special¬ 
ized to the work of particular departments or groups of departments. 



114 


THE TEACHING OF STATISTICS 


Of the two fundamental courses one will require calculus as a prerequisite, the 
other only a knowledge of first-year algebra. It is to be hoped that the less 
mathematical of these two general statistical courses, instead of being elected by 
a majority of students, will gradually approach extinction, while the course based 
on calculus will become the vital point of contact of the student body with the 
concepts of statistics. The chief reason for insisting upon the importance of 
calculus as a prerequisite is simply the possibility of covering important statistical 
theory that is inaccessible to those who do not have it. 

Modern statistical methods are based on the theory of probability. The 
general courses in statistics may therefore well begin with elementary probability. 
The duality between probability and statistical concepts, 7 for example between 
probability and relative frequency, between mathematical expectation and a 
sample mean, between parameter and statistic, should be explained. Deriva¬ 
tions and the place of the normal distribution should be sketched, and the Student 
distribution should be derived and applied to a variety of problems in the first 
course based on calculus. Later courses given by the department of statistics, 
or whoever specializes in statistical theory, will naturally cover other statistical 
methods and theories. At the same time useful courses can be offered in eco¬ 
nomic statistics, mental testing, and other fields using statistical methods by 
specialists, regardless of departmental affiliation. There might be departmental 
cooperation; for example, the department of statistics might offer elementary and 
advanced courses in correlation and multivariate analysis, and the department of 
psychology might require these as prerequisites for some of its work in mental 
testing. , 

The teaching of statistics should be accompanied by considerable work in 
applied statistical problems, as well as exercises in mathematical theory, on the 
part of the students. A large part of this work in applied statistics is best con¬ 
ducted in a laboratory equipped with calculating machines, mathematical tables, 
drafting instruments, and other appurtenances. 

Statistical laboratories require supervision, administration and maintenance. 
They are needed not only for the purpose of teaching statistics, pure and applied, 
at all levels, but also by research workers in many fields. There are possible 
gains of efficiency and economy in a centralized administration of them. One 
suggestion is that they be under the supervision of the university library. 
Another is that responsibility for them be lodged in a central department of 
statistics, or in a two-department statistical institute. Centralization can be 
carried too far, and it is likely that some units in a large organization will find 
it advantageous to have machines which are exclusively their own. The con¬ 
flicting claims regarding machines and laboratories will require careful weighing. 

23. Statistical method as a part of a liberal education. A question may also 
be raised as to whether some work in the statistical method should not be re¬ 
quired of all college students as a part of a liberal education. This would be 


7 Cf. the article “Frequency distribution,” Encycl. of the Social Sciences (1931). 



THE TEACHING OF STATISTICS 


115 


a novel step, but has much to be said for it in view of the widespread use of 
statistics and growing interest in statistics. Another point is that the student 
who can’t make up his mind as to his ultimate field of specialization or vocation 
will do well to study those things that can be used in many fields. Of such things,, 
mathematics and statistics are leading examples. There are more or less sound 
objections to systems of required studies; but if we are to have them, the claim 
of statistics should not be rejected merely on grounds of novelty. 



ABSTRACTS OF PAPERS 

Presented December 22,1047 at the Berkeley Meeting of the Institute 


1. The Performance Characteristic of Certain Methods for Obtaining Confidence 
Intervals. B. M. Bennett and J. Neyman, University of California, 
Berkeley. 

Certain methods for obtaining confidence limits have been introduced by Bliss, R. A. 
Fisher and Paulson. Thus, e.g., let X {, (t - 1, • • • , n) represent a sample from a bivari¬ 
ate normal population with means E(x{) — |, E(y % ) — <x£ and variances and covariance 
* % m »9 «*» • If , Sxv are the sample means, variances and covariance respec¬ 

tively, then in order to determine confidence limits for a, the ratio: 

V / n(j 7 — «&) 


may be referred to the appropriate value t «of the Student-t distribution. The inequality: 
| u | < tt may, in general, be solved as a quadratic equation in a to yield two values a f a 
which are presumed to be confidence limits for a. In this paper the probability r of being 
correct in using such a procedure, i.e. y the performance or operating characteristic, is com¬ 
puted in the limiting case when <r* , a*, <r xy — fxr x <r v are assumed to be known. It is shown that 
t is a function ir(a,£,<r*, <r*,p) of all the parameters, and in particular of a itself, the quantity 
for which confidence limits are supposed to be provided. Similar 1 ‘quadratic” methods 
are also used in certain regression problems, e.g., in determining confidence limits for a 
value of x corresponding to an additional value of y when a previous sample regression of y 
on x is available; or in determining confidence limits for the intersection point of two popu¬ 
lation regression lines. The performance characteristic of each of these methods is shown 
to be a function of the quantity for which the method gives confidence limits. 

2. Some Further Results on the Bernoulli Process. T. E. Harris, Douglas 
Aircraft Co. 

Let z \, z \, s s i • • • , be a sequence of random variables defined as follows: P{z t — r) - p r , 
r — 0,1, 2, • • • , k. If z n — 0, Zn+i ** 0. If Zn -» r, r 0, then z n+ 1 is distributed as the 
sum of r independent random variables, each having the same distribution as z \. It is 
assumed that * < 1, where x « E{ty ). Let N be the smallest value of n such that «„+1 * 0. 
A method is given for obtaining an expansion of the moment-generating function of N . 
In the case where p r « 0 for r > 3, this expansion takes the form 1 + (1 — e~*) (1 — p*) 
F(«), where F(«) - fi(s) - pj(l - po)/i(«) = 2*pJ(l - p®) f /i(«) - ••• , where A(s) - 
(«'• — x)~ l , and /„(*) «■ /»-i(«)(*"• - x*)~ l . Certain restrictions on the constants p r 
insure that this expansion converges for a complex neighborhood of s » 0. 

3. Most Powerful Tests of Composite Hypotheses I. Normal Distributions. 

E. L. Lehmann and C. M. Stein, University of California, Berkeley, 
California. 

Critical regions are determined for testing a composite hypothesis, which are most power¬ 
ful against a particular alternative among all critical regions whose probabilities under the 
hypothesis tested are bounded above by the level of significance. These problems have 
been considered by Neyman, Pearson and others, subject to the condition that the critical 
region be similar. In testing the hypothesis specifying the value of the variance of a normal 

116 



ABSTRACTS OF PAPERS 


117 


distribution with unknown mean against an alternative with larger variance} and in some 
other problems, the best similar region is also most powerful in the sense of this paper. 
However, in the analogous problem when the variance under the alternative hypothesis 
is less than that under the hypothesis tested, in the case of Student’s hypothesis when the 
level of significance is less than and in some other cases, the best similar region is not 
most powerful in the sense of this paper. There exist most powerful tests which are quite 
good against certain alternatives in some cases where no proper similar region exists. 
These results indicate that in some practical cases the standard test is not best if the class 
of alternatives is sufficiently restricted. 

4. On the Selection of Forecasting Formulas. Paul G. Hoel, University of 

California, Los Angeles, California. 

Given two competing formulas, u ■= g{z x , • • • , z n ) and v = k(z x ,••♦,*„), for forecast¬ 
ing a variable x, a significance test possessing optimum properties is designed for deciding 
whether one formula yields significantly better forecasts than the other. The test, which 
turns out to be a Student / test, is constructed as a test of the hypothesis Ho : m, = w< against 
the alternative Hi : -» t/<, (t = 1, ••• f n), in which it is assumed that the variables 

X\ , • • • , x n , corresponding to the n samples, are independently normally distributed with 
means and variances o i = a 2 . 

5. On the Power Function of the “Best” /-test Solution of the Behrens-Fisher 

Problem. J. E. Walsh, Douglas Aircraft Company 

The most powerful /-test solution of the Behrens-Fisher problem (one-sided and sym¬ 
metrical) was obtained by Scheff6 in Annals of Mathematical Statistics , Vol. 14 (1943), pp- 
36-44. This note derives (approximately) the power efficiency of this /-test for the case 
in which the ratio of the variances of the normal populations is also known. Let the /-test 
be based on m sample values from the first normal population and » sample values from the 
second normal population, where m < n. For fixed values ef m and n, a symmetrical 
/-test with significance level 2« has the same power efficiency as a one-sided /-test with 
significance level a. For one-sided /-tests with significance level a , the power efficiency 
is approximately 60 [b *f \/B* — 8(m + n)A\/{m + n), where B ■* 2 -f (m + n)A + Ka/2, 
A ■** 1 — Ka/2(m — 1), and Ka is the standardized normal deviate exceeded with probability 
a. This approximation is reasonably accurate for m ;> 4 if a = .06, m ^ 6 if a -> .025, 
m ^ 6 if a « .01, m > 7 if a *= .005. Intuitively the power efficiency of a test measures 
the percentage of available information per observation which is utilized by that test. 

6. On Sequences of Experiments. Charles Stein, University of California, 

Berkeley, California. 

One performs a sequence of N experiments to decide between two simple hypotheses 
regarding probability distributions of certain observable quantities. At each stage there 
is a choice among L experiments and the one chosen yields a random variable. One wishes 
to achieve certain upper bounds a and p to the probabilities of first and second kind errors 
respectively, and, subject to these restrictions, to minimize the expected cost under a third 
hypothesis. The cost of each particular sequence of experiments is known. A solution 
is obtained, essentially by applying Lagrange’s method and working back from the end 
of the experiment. This can be generalized to multiple decision problems. The results 
are applied to two-sample tests with the second sample of variable size, and to Wald’s 
sequential analysis. As another problem, suppose (Xi, Yi), (Xt, Y%) ••• are independ¬ 
ently distributed with bivariate normal distributions having mean ( and covariance matrix 
2, both unknown. One tests Ho : f « 0 against Hi : A test (not necessarily 

optimum) valid within the usual approximation is obtained from the ratio of the p.d.f. 



118 


ABSTRACTS OF PAPERS 


of Hotelling’s T* under Hi to that under H *. Analogous results hold for the multiple 
correlation coefficient, ratio of two variances and test for linear hypothesis. 

7. The Effect of Selection Above Definite Lower Limits of Linear Functions of 
Normally Distributed Correlated Variables on the Means and Variances of 
Other linear Functions. G. A. Baker, University of California, Davis, 
California. 

Sometimes certain variables in a system can be observed before other economically or 
socially important variables. These variables or linear combinations of them can be used 
as a basis of selection at given levels. The question is: How does selection on these earlier 
or more easily available variables affect the mean and variance of the economically or so¬ 
cially more important variables or, perhaps, linear functions of the more important vari¬ 
ables. The general procedure is clear. We transform to a new system of variables which 
contains the linear functions on which selection is performed and the linear functions of 
which the means and variances are required as separate variables. The remaining new 
variables are eliminated by integration. The final calculation involves the numerical 
evaluation of integrals whose integrands are the product of polynomials and normal multi¬ 
variate functions and whose limits depend on the given levels of selections. The general 
ideas are simple but the actual labor of computation in a given case is tedious. An example 
is considered in detail. 


8. An Inversion Formula for the Distribution of a Ratio of Random Variables. 

J. Gurland, University of California, Berkeley, Calif. 


The repeated Cauchy principal value of integrals applied to characteristic functions is 
used in obtaining inversion formulae for distribution functions. Let the random variables 
X\ and Xt have a joint distribution function with corresponding characteristic function 

*(«i , ft). 


Suppose P[X « < 0} » 0. Let ^ g(t) dt « lim ^ J -f ^ g(t ) dt for any 


function g(t). If 0(x) is the distribution function of Xi/X* then G(x) + Q(x — 0) — 
1 / — t m ) 

I-. (t - dt. This formula is free of restrictions which accompany the formula 

n J t 


given by Cramer in the case where Xi and X% are independent; and differentiation extends 
a result of Geary to a much larger class of distribution functions. Further generalizations 
of the theory are obtained, and as an example the distribution function of the ratio of quad¬ 
ratic forms of random variables Xi ,Xt ••• X n is considered in the case where Xi , X% • • • X n 
have a multivariate normal distribution. 


9. Independence of Parameters and Sufficient Statistics. E. W. Barankin, 
Uhiversity of California, Berkeley, California. 

The notions of complete set of independent parameters and minimal eel of sufficient statistics 
are suitably defined for a class of families of probability densities {p(x i, • • • , x. ; 
di, • • • , £,)}, and the order of each of these sets is determined as the rank of a certain 
matrix. Second order continuous differentiability is eventually required of the function p ; 
and oertain other conditions are laid down, designed to ensure that the behavior of p in 
the large is similar to its behavior in the small when only continuous differentiability is 
assumed. The problem of determining the order of a minimal set of sufficient statistics 
is made, by certain device, to become identical in character with that of finding the order 
of a complete set of independent parameters. (This is in the nature of these concepts.) 



ABSTRACTS OP PAPERS 


119 


An explioit method is given for finding a complete set of independent parameters an d a 
minimal set of sufficient statistics. 


(Presented December 90, 1047 at New York at the Annual Meeting of the Institute) 

1. Distribution of the Circular Serial Correlation Coefficient for Residuals from 
a Fitted Fourier Series ( Preliminary Report). R. L. Anderson, University 
of North Carolina, Raleigh, North Carolina and T. W. Anderson, Columbia 
University. 

Given a set of N observations (X<|, which are defined as follows: 

Xi — - p* ( Xi — L — m — L) 4 u , 

where the residuals {«) are assumed to be normally and independently distributed with 
sero means and equal variances and L is the lag. A statistic for testing the null hypoth> 
esis: p «■ 0 is lR , the circular serial correlation coefficient of residuals e< from a regression 
line fitted by least squares: Xi ■» Mi 4 • The following regression line is considered: 

Mi — do 4 y/ a* Cos —- 4- 5* Sin —-, 

* N k N 

where k ranges over some subset of the integers 1, 2, ••• , $(iV — 1) or £(iV) t depending 
on whether N is odd or even (if N is even, b^ N is not used). Hence L R is defined as: 

D ei ei+\ + e 2 cjt+a 4 • * • 4 es *l+n 
lR -- X? { ’ 


with ■* e* . 

The distribution of this L R has the same general form as that presented by R. L. Anderson 
for p — 0 [“Distribution of the serial correlation coefficient, ,f Annals of Math. Statistics 
13:1-13(1942)); and for p 5* 0 by W. G. Madow [“Note on the distribution of the serial 
correlation coefficient,** Annals of Math . Statistics 16:308-310(1945)J. 

N 

For Mi consisting of terms of only one period, — * 2, 3, 4, 6,12 and 24, exact values 

of the 1% and 5% significance levels of 1 R have been computed for N « 12 and 24. Ap¬ 
proximate significance levels have been computed for N » 12(12)96. More of the exact 
significance levels are being computed, and all computations will be extended to include 
some multiple periods and some lags greater than 1. 

2. Some New Methods for Distributions of Quadratic Forms. Harold 

Hotelling, Institute of Statistics, University of North Carolina, Chapel Hill. 

Any homogeneous quadratic form in normally distributed variates of zero means has 
the same distribution as q - J(oixJ 4 • • * 4 a»a£), where the o< are root* of a determinantal 
equation based on the coefficients of the given form and the parameters of the normal 
distribution, and where the x t are normally and independently distributed with zero means 
and unit variances. We take 2o< - n, and begin by expanding the distribution of a positive 
definite form in a series of powers of q whose coefficients are polynomials in the reciprocals 
of the a<. This series shows the analyticity of the function, which is then expressed as 
the product of a X* distribution function of a series of Laguerre polynomials with coefficients 
which are simple polynomials in the moments of the • Indefinite forms and certain ratios 
of forms are dealt with by convolutions of these series and by other means. 



120 


ABSTRACTS OF PAPERS 


3. Frequency Functions Defined by the Pearson Difference Equation. Leo 

Katz, Michigan State College, East Lansing, Michigan. 

Frequency “links” formed from the Pearson difference equation provide an efficient 
means of fitting functions to observed distributions. These links, involving three constants 
which are determined by the first four moments of the observed series, correspond to a 
three-parameter family of discrete frequency functions. This family of functions is just 
as broad as that defined by the differential equation, containing functions of equally diverse 
types; m addition, it has the very important advantage that the graduation process is the 
same for any type. Further, the simpler functions of the family all correspond to points 
lying in one plane of the parameter space. This plane, giving a two-parameter family 
of functions (depending upon the first three moments), is studied intensively, rather com¬ 
plete results* being obtainable for areas, moments, sampling characteristics of moments, 
etc. It is also shown that the problem of discrimination among simple discrete frequency 
functions for graduating observed data is resolvable (in the plane) to the sampling distri¬ 
bution of one statistic. A special case of the two-parameter family depending on only the 
first two moments was previously discussed. 

4. Distribution of the Sum of Roots of a Determinantal Equation under a 

Certain Condition. D. N. Nanda, Institute of Statistics, University of North 

Carolina, Chapel Hill. 

Let x =** |1 Xu || and x* = |( x*- 11 be two p-variate sample matrices with n\ and n* degrees 
of freedom. Then S = xx'fn\ and S* — x*x*'/n* are, under the null hypothesis, independ¬ 
ent estimates of the same population covariance matrix. The distribution of a root, speci¬ 
fied by its rank order, of the determinantal equation | A — 0(A + B) | = 0, where A == niS 
and B = n s S*, has already been given by S. N. Roy, and by the author, who has also ob¬ 
tained the limiting distribution of any root when one of the samples becomes infinitely 
large. The moment generating function of the sum of the roots when ni «■ p ± 1 can be 
derived from the limiting distribution of the largest root. The probability distributions 
of the sum of roots under this condition have been formulated for the determinantal equa¬ 
tions having two, three, and four roots. The moments of these distributions have also 
been obtained. The method is applicable for the determinantal equation of any order. 
These probability distributions can easily be tabulated, as they involve only simple al¬ 
gebraic and incomplete beta functions. 

5. Applications of Carnap’s Probability Theory to Statistical Inference. 

Gerhard Tintner, Iowa State College, Ames, Iowa. 

The new theory of probability of Rudolf Carnap (“On inductive logic,” Philosophy of 
Science , vol. 12,1946, pp. 72 ff. “The two concepts of probability,” Philosophy and Phe¬ 
nomenological Research , vol. 5,1944, pp. 513 ff.) introduces a distinction between prob&bil- 
ityi, the degree of confirmation, and probability!, related to relative frequency. It is 
believed, that the ideas developed are useful in clarifying the problems of statistical in¬ 
ference. 

As an example, consider the case of “inverse inference,” i.e. inference from a sample to 
the population. The evidence is that in a sample of size s there are »% individuals with 
a certain property M and a* — a — 8i without the property. The hypothesis is that in the 
population consisting of n individuals there are ni individuals with property M and ns ■" 
n — m individuals without this property. The degree of confirmation is then: 



ABSTRACTS OF PAPERS 


121 


In this formula we have: wi the logical width of the property M , wt the logical width of 
the property non-3f, h - w\ + to* . It should be noted that for w\ - w% * 1 the formula 
becomes the classical result, i.e. a term of the hypergeometric distribution. 

This idea may be applied to statistical estimation. We could for instance choose ni 
in such a fashion that c* becomes a maximum. This would be estimation by the principle 
of maximum degree of confirmation, analogous to maximum likelihood. In a similar fashion 
we may also use c* to establish limits for ni similar to confidence or fiducial intervals. 

G. Circular Probable Error of an Elliptical Gaussian Distribution, Hallett H. 
Germokd, S. W. Marshall & Co., Consulting Engineers, Washington, D. C. 

Preliminary tables are presented, giving the radii of distribution-centered circular 
cylinders enclosing various percentages of the volume under an elliptical bivariate Gaussian 
surface. These tables are further interpreted in terms of a correlated bivariate Gaussian 
distribution. The application of these tables to impact analysis is illustrated. 


(Presented December 29,1947 at the Chicago Meeting of the Institute) 

1. The Asymptotic Analogue of the Theorem of CramSr and Rao. Herman 
Rubin, Institute for Advanced Study, Princeton, N. J. 

The author generalizes the results of Cram6r and Rao on the minimum variance of es¬ 
timates to the case of the asymptotic distribution of an estimate. He shows that if certain 
regularity conditions are satisfied, the formula given by CramSr and Rao remains valid. 
The main results are obtained in the case of consistent estimates, but with a stronger set 
of hypotheses, the results remain true for estimates which are not consistent. The method 
used to obtain these results is to construct statistics to which the theorem of Cramgr and 
Rao can be applied, and whose variance converges to the variance of the limiting distribu¬ 
tion. This procedure is also applied to the case in which there is no limiting distribution, 
and in which two sequences of distributions are considered which act as if they approach 
each other. 



BOOK REVIEWS 


/ 

Sequential Analysis Abraham Wald . John Wiley and Sons, Inc. pp. vi, 212, 
$4.00. 

Reviewed by M. A. Gieshick 
Douglas Aircraft Company 

The development of sequential analysis as a new tool of statistics is by and 
large the work of Abraham Wald. This fact in itself would make the appear¬ 
ance of a book by him on this subject an important event. However, Wald in 
this book did more than discuss the present status of sequential theory. He 
has, in fact, written a very lucid treatise on the general subject of statistical 
inference—a treatise which is likely to have great influence on statistical think¬ 
ing. 

While this book is not written for the mathematically untrained, a knowledge 
of differential and integral calculus will suffice to follow all the arguments ex¬ 
cept perhaps for some sections in the appendix where the more complicated 
proofs have been placed. 

The main body of this book is divided into 3 parts and 11 chapters. Part I, 
covering chapters 1 to 4 inclusive, deals with the general theory of the sequential 
probability ratio test. Chapter 1 introduces in an elementary fashion the no¬ 
tion of probability distributions, tests of hypotheses and the Neyman-Pearson 
theory of two-valued decisions based on a fixed sample size. In Chapter 2, 
the general notion of a sequential test procedure is introduced and the operating 
characteristics of such tests are discussed. Chapter 3 deals with the sequential 
probability ratio test for testing a single hypothesis against a single alternative. 
Here the boundaries of this sequential criterion are expressed in terms of the 
risks, the operating characteristic and the average sample number functions 
are developed and bounds are obtained for the errors arising from truncation 
and neglect of excess over the boundaries. Chapter 4 presents a sequential 
theoiy for testing simple and composite hypotheses against a set of alternatives. 
The fundamental idea introduced is the concept of a weight function in the 
parameter space which permits handling composite hypotheses, or simple hypo¬ 
theses with many alternatives, by means of the sequential probability ratio 
test. 

Part II of this book, consisting of chapters 5 to 9 inclusive, deals with the 
applications of sequential analysis to special problems. Chapter 5 contains a 
discussion of the binomial case with specific reference to lot-by-lot acceptance 
inspection. Of special interest in this chapter is the derivation of the exact 
characteristic function for a large class of tests and the development of upper 
and lower limits for the effect of grouping on the OC and ASN curves. Chap¬ 
ter 6 deals with the problem of double dichotomies. A procedure for testing 
the difference between the parameters of two binomial distributions is developed 

122 



BOOK REVIEWS 


123 


for the fixed size as well as the sequential procedure. Chapters 7, 8, and 9 are 
concerned with the application of sequential analysis to the normal distribution. 
In these chapters the sequential probability ratio test is applied to hypotheses 
concerning the mean of a normal distribution when the variance is known, when 
the variance is not known (non-central t case) and hypotheses concerning the 
variance when the mean is known and when the mean is not known. 

Part III consists of two short chapters and deals with multi-valued decisions 
and sequential interval estimation. The results in these chapters are not de¬ 
finitive answers to the two outstanding problems in statistical inference but are 
merely suggestive of a possible approach to them. Nevertheless, from the 
point of view of stimulating future research these 2 chapters are perhaps the 
most valuable sections of this book. The reader, having been exposed in the 
previous chapters to various tests the outcome of which is a two-valued decision, 
is naturally led in Chapter 10 to the consideration of tests the outcome of which 
is a multi-valued decision. The notion of a risk function, introduced elsewhere 
by the author in the non-sequential case, is again used as the main tool in handling 
multi-valued decisions sequentially. In Chapter 11 the important problem of 
setting up confidence intervals of fixed length by means of a sequential proce¬ 
dure is discussed and a possible method for accomplishing this is indicated. 

As was previously noted, the main theorems on sequential analysis are con¬ 
tained in the Appendix and since they have all been previously published in the 
Annals they will not be mentioned in the present review. The Appendix, to¬ 
gether with the main body of the book form a fairly exhaustive treatment of 
sequential theory. A notable exception to this is the lack of any mention of the 
published research on sequential point estimation. This is probably accounted 
for by the fact that this research came too late to be included in the book. Other 
minor omissions that may be noted are references to the generalization of the 
Fundamental Identity to more than one dimension and other theorems on 
sequences of functions of random vectors which have appeared in print. Also 
no mention is made of the similarity of sequential analysis to the problems of 
the random walk and the gambler’s ruin. This, in the opinion of the reviewer, 
is regrettable. 

This book will make a very suitable companion to the book Sequential Analy¬ 
sis of Statistical Data: Applications prepared by the Statistical Research Group, 
Columbia University (see review by J. W. Tukey, Ann. of Math . Stat. Vol. 
xviii, 1947). While there is some overlap in the material covered, the two books 
differ in emphasis. Wald’s book, though not highly technical, is more in the 
nature of a textbook on the theory and application of sequential analysis. The 
SRG book on the other hand, was prepared mainly for statisticians who may 
wish to use sequential analysis in practice. The latter book is therefore more 
detailed and puts less emphasis on the theoretical aspects of the sequential 
procedure. 

The book is surprisingly free of typographical errors which is a tribute to the 
high quality of the editorship. 



124 


BOOK REVIEWS 


Statistical Methods. George W. Snedecor. Ames, Iowa: The Iowa State 

College Press, Inc., 1946; pp. xvi, 485. $4.50. 

Reviewed by Frederick Mosteller 
Harvard University 

Statistical Methods is a non-mathematical treatment of modem experimental 
statistics. Few non-mathematical books are available that treat such topics 
as confidence limits, use of transformations, and analysis of variance and covari¬ 
ance in the detail presented by Snedecor. The examples are largely, but not 
entirely, drawn from agriculture and animal husbandry. The exercises for 
students are extensive and thought-provoking. 

Unlike most non-mathematical texts the book under review does not spend 
pages and pages on methods of recording frequencies and methods of computing 
countless moments which are seldom used in the later developments of the text. 
There is no long exasperating discussion of kurtosis and skewness; and there is 
no parade of qualitative Greek names for categorizing frequency distributions. 

The reviewer has u§ed this book for teaching a second course in statistics to 
social science majors with reasonable success. The main disadvantage was the 
biological nature of most of the examples, but until some author writes a com¬ 
parable book using social science examples, the reviewer will continue to use 
Snedecor’s material for a large part of the course. 

The main differences between the Third and Fourth Editions of this text have 
been adequately summarized by Snedecor: 

“(i) greater emphasis has been placed on the theoretical conditions in which 
the various statistical methods have validity, and concurrently (ii) on the conduct 
of the experiment so as to incorporate in the data the information desired; (iii) 
estimates and fiducial statements have been brought into equal prominence 
with tests of hypotheses; (iv) there is increased reliance on experimental sam¬ 
plings to exemplify distribution theory; (v) the treatment of correlation and of 
experimental designs has been expanded; and (vi) the methods for dispropor¬ 
tionate subclass numbers have been extended to include all those necessary for 
ordinary needs. ,, Some more obvious changes in the Fourth Edition are the 
entirely new type and summaries which are included at the end of some of the 
chapters. The practice of using random sampling numbers (iv) to help explain 
theory has long been employed by teachers of statistics, but few authors have 
taken as much advantage of this technique as has Snedecor. In the Fourth 
Edition confidence intervals are widely used (iii). The author uses the adjec¬ 
tives “confidence” and “fiducial” more or less interchangeably, but it is the 
reviewer’s opinion that it is the Neyman concept rather than the Fisherian that 
predominates. It should be remarked that this is one of the few texts that 
give the students the idea that in linear regression we do not predict y with the 
same accuracy for every x even when linearity and homoscedasticity hold (v). 

The main emphasis of the book is on the analysis of variance. The author 
succeeds extremely well in showing the student how to carry out the analysis 



BOOK REVIEWS 


X25 


even at rather complex levels. On some other points he was not quite so suc¬ 
cessful. For example, the reviewer feels that the meaning of “interaction” was 
never gotten across, and that for the student the higher order interactions are 
still just things to be computed. Furthermore in attempting to make sure 
that the student understands how to do the computation the author often does 
not encourage the student to take any overall view of the data before blindly 
starting to compute. In addition, reasons for doing the experiment are some¬ 
times vague and the conclusions are often couched only in the jargon of analysis 
of variance. Therefore, the student seldom gets an opportunity to find out 
what kinds of recommendations might reasonably be made as the result of an 
experiment. Perhaps the worst example is on pages 275-280. Here the 
experiment deals with yield of wheat in 48 pots, with two series of soil treatments, 
humus and chemical. Anyone glancing over the results of the experiment will 
be startled to find that every yield from pots with “no humus treatment” (12 
observations) is greater than any yield with “humus treatment” (36 obser¬ 
vations). The reader will be further startled to find that all the evidence tends 
to support the notion that “no chemical treatment” is at least as fruitful as any 
of the chemical treatments tried. However, Snedecor says “The striking feature 
of this experiment is the discrepance among the subclasses. The chemicals 
applied to one humus treatment produced yields out of accord with those from 
other humus treatments.” Snedecor then pushes on to a more subtle analysis. 
The reviewer feels that here as elsewhere in the book the author occasionally 
forgets that the extended analysis looks rather ridiculous unless the practicality 
of applying the technique is discussed. The example considered here is one in 
which the point could profitably be made that everyone can see from a visual 
examination of the data what the results of the experiment show. The analysis 
backs up the student’s common sense appraisal of the situation and gives him 
more confidence in and understanding of the method when it is applied in more 
delicate situations. It seems to the reviewer that too many times the appli¬ 
cation of the analysis of variance obfuscates the main point of the experiment. 
In the haste to get to the computations and the comparisons of interactions and 
errors the author frequently neglects to impress the student with the funda¬ 
mental differences between means and their ultimate interpretation. However, 
the author does bring out clearly the notion of the various estimates of variance, 
a subject frequently neglected. 

In the next to last chapter the binomial and Poisson distributions are discussed. 
In this connection the inverse sine and the square root transformations are 
treated briefly, as is the logarithmic transformation. It is surprising that no 
indication is given of the theoretical variances when the inverse sine and square 
root tr ansf ormations are used. The theoretical discussion of the transformation 
is limited to the remark that these transformations tend to make the variance 
independent of the means, but there is no indication of the further advantages. 
This is surprising because in a much earlier chapter the use of Fisher’s trans¬ 
formation for correlation coefficients was treated quite adequately. It seems 



126 


BOOK REVIEWS 


to the reviewer that in a later edition the use of transformation might well be 
moved forward in the book, and that the theoretical and practical implications 
might be treated more thoroughly. 

As in most other texts the final chapter “Design and Analysis of Samplings” 
needs very considerable expansion. 

The book begins (Chapter 1) with a consideration of the sampling of attributes, 
inferences that can be drawn about the population, confidence limits, use of 
chi-square in a 1 x 2 table, and some discussion of the use of ratios, rates, and 
percentages. Measurement data is then (Chapter 2) discussed including the 
computation and application of the mean, range, standard deviation, probable 
deviation, median, and quartiles. The concepts of null hypothesis and confi¬ 
dence limits are introduced in Chapter 2 and elaborated in Chapter 3 which 
concerns sampling from a normally distributed population, random samples, 
distribution of the mean, variance, standard deviation, and of t. The com¬ 
parison of two groups in contrast to individuals is treated in Chapter 4 including 
groups with different numbers of individuals. Chapter 5 provides material on 
short cut methods of computation using calculating machines, code numbers 
are explained, suggestions about significant numbers and rates and percentages 
are given, and the use of the ratio range/sigma is introduced. 

After considering linear regression and correlation (Chapters 6, 7) the author 
relates the two notions, and then goes on to consider some interesting special 
cases of correlation. Chapter 8 deals with large sample methods. Chapter 9 
concerns enumeration data with more than one degree of freedom, discusses 
adjustments of chi-square and its computation with large numbers of degrees of 
freedom, and describes the analysis of 2 x 2 x 2, R x 2, and R x C tables. The 
computation of the analysis of variance for two or more groups of measurement 
data and with two or more criteria of classification: variance ratio F, use of 
Latin square, analysis with disproportionate subclass numbers, and the use of 
randomized blocks are considered in Chapter 10 and 11, while analysis of co- 
variance is treated in Chapter 12 (22 pages). Multiple regression including 
partial and multiple correlation coefficients, tests of significance and confidence 
limits are handled in Chapter 13 and curvilinear regression considered in Chapter 
16. Chapter 16 deals with binomial and Poisson data, and Chapter 17 discusses 
the design and analysis of sampling, including sampling from a homogeneous 
or small population and the effectiveness of stratification. 

It seems to the reviewer that at the present time one would be hard put to 
find a better statistics text written at this level. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Franz L. Alt, who has been with the Econometric Institute, New York, 
as Assistant Director of Research, is now Deputy Chief of the Computing Labo¬ 
ratory at the Ballistic Research Laboratories, Aberdeen Proving Ground, Aber¬ 
deen, Maryland. 

Mr. A. George Carlton has accepted a position as Assistant Professor of 
Mathematics at the University of Illinois. 

Assistant Professor Paul R. Halmos, University of Chicago, Chicago, Illinois 
is on leave for the academic year. He is spending the year at the Institute for 
Advanced Study, Princeton, New Jersey on a Guggenheim Fellowship and 
will return to the University of Chicago in September, 1948. 

Mr. Henry F. Hebley of the Pittsburgh Coal Co. spent most of last summer 
in Eastern Europe carrying out a survey on coal production and fuel availa¬ 
bility in Poland. This work was carried out in the interest of the International 
Bank for Reconstruction and Development. 

Dr. Harold D. Larsen, former Associate Professor at the University of New 
Mexico, has joined the faculty of Albion College, Albion, Michigan. 

Mr. Dickson H. Leavens has resigned as Research Associate of the Cowles 
Commission for Research in Economics. He will continue as Managing Editor 
of Econometrica and may be addressed at 1632 Wood Avenue, Colorado Springs, 
Colorado. 

Professor S. B. Littauer, who has been Chairman of the Mathematics Depart¬ 
ment, Newark College of Engineering, Newark, New Jersey, has now accepted 
an associate professorship in the Department of Industrial Engineering, Columbia 
University. 

Professor Harris F. MacNeish, who has been Chairman of the Department of 
Mathematics at Brooklyn College since its foundation in 1930, has resigned 
to" accept a visiting professorship in Mathematics at the University of Miami, 
Coral Gables, Florida. 

Mr. Clifford J. Maloney has resigned a position as Research Associate in the 
Statistical Laboratory of Iowa State College to serve as Chief, Statistics Branch, 
Camp Detrick, Frederick, Maryland, an agency of the Chemical Corps of the 
United States Army. 

Mr. Monroe L. Norden, who has formerly been with the Ballistic Research 
Laboratories, Aberdeen Proving Ground, Maryland, has accepted a research 
position in theoretical or mathematical statistics at the Douglas Aircraft Co., 
Santa Monica, California. 

Mr. W. E. Pattee has resigned his position as statistical engineer with the 
Canadian Industries Limited, Skawinigan Falls, Quebec and has accepted a 
position as senior chemist, Ottawa Mill, E. B. Eddy Company, Hull, Quebec. 

127 



128 


NEWS AND NOTICES 


Mr. Robert I. Piper, who was formerly plant staff assistant at the Southern 
California Telephone Company of Los Angeles, has been transferred to the 
systems office of the Pacific Telephone and Telegraph Company. He will assist 
in planning and analysing sampling surveys of the wages rates prevailing in 
the Pacific coast states in which the company operates. 

Mr. Herbert Solomon, who was formerly an instructor at the College of the 
City of New York, has accepted an assistant professorship in the Mathematics 
Department, Newark College of Engineering, Newark 2, New Jersey. 

Dr. A. G. Swanson, formerly an assistant chairman of the Department of 
Mathematics and Mechanics at the General Motors Institute, Flint, Michigan, 
has accepted an associate professorship in the Department of Mathematics, 
Gustavus Adolphus College, St. Peter, Minnesota. 


A federal center of applied mathematics—the National Applied Mathematics 
Laboratories—has been established as a division of the National Bureau of 
Standards. The new organization is oriented around modem mathematical 
statistics as applied to the physical and engineering sciences and to the develop¬ 
ment and use of modem high speed computing. The applied mathematics 
laboratories include four separate laboratories: the Institute of Numerical 
Analysis; the Computation Laboratory; the Statistical Engineering Laboratory; 
and the Machine Development Laboratory. 

Two members of the Institute have been given important positions in this 
organization. Dr. John Curtiss, who has been Director's Assistant in Applied 
Mathematics at the Bureau of Standards, has been named Chief of the National 
Applied Mathematics Laboratories. Dr. Churchill Eisenhart has been ap¬ 
pointed head of the Statistical Engineering Laboratory. 


Statistical Summer Sessions at the University of California, Berkeley 

Following the encouraging experience of last year the University of California 
offers statistical programs in the two Summer Sessions of 1948. The teaching 
staff is as follows: 

Raj Chandra Bose, Professor of the University of Calcutta, India. 

Miss Evelyn Fix, Lecturer at the University of California, Berkeley. 

Erich L. Lehmann, Assistant Professor of the University of California, 
Berkeley. 

Michel LofcvE, Reader at the University of London, England. 

Jerzy Neyman, Professor of the University of California, Berkeley. 

Abraham Wald, Professor of Columbia University, New York. 

Courses in statistics are offered on both the graduate and the undergraduate 
levels. The graduate courses, all given during the First Summer Session, June 21 



NEWS AND NOTICES 


129 


to July 31, are meant primarily for students who either have already obtained 
their Ph.D. degree or are working towards it. Therefore, apart from formal 
classes, it is proposed to hold extensive seminars in which the work of students 
will be discussed. No specific prerequisites to graduate courses will be required. 
However, to benefit from the courses, the students must be generally familiar 
with the theory of statistics. In addition, course 272 and especially 271 will 
require a reasonable knowledge of the theory of functions. 

There will be two undergraduate courses offered, course S12 during the First 
Summer Session, June 21 to July 31, and course S113 during the Second Summer 
Session, August 2 to September 11. Both of these courses were recently in¬ 
troduced into the curriculum and are prerequisites to more advanced courses 
in statistics. They are offered during the Summer Sessions for the benefit of 
students, otherwise advanced, who plan to attend more advanced courses in 
statistics during the fall semester. Besides, course S12 is recommended for 
students who do not intend to specialize in statistics but wish to acquire some 
knowledge of this subject as a part of their general education. 

The Statistical Laboratory will be available for students doing research. 

First Summer Session 


gl2. Elements of Probability and Statistics. 

271. Random Functions. 

272. Sequential Analysis. 

273. Design of Experiments. 

S290s. Seminar in Theory of Statistics. 

290t. Seminar in Design of Experiments. 
S295. Individual Research. 


Mb. Lehmann 
Mb. LoAve 
Mb. Wald 
Mb. Bose 
Mb. LoAve, Mb. Wald 
Mb. Bose 
Mb. Bose, Mr. LoAve, 
Mr. Neyman, Mr. Wald 


Second Summer Session 

S113. Second Course in Probability and Statistics. Miss Fix. 


Statistical Sessions at Alabama Polytechnic Institute 

Professor George W. Snedecor, President of the American Statistical Associa¬ 
tion and Research Professor of Statistics at Iowa State College, will be Visiting 
Research Professor of Statistics at Alabama Polytechnic Institute during the 
Spring .Quarter, from March 22 to June 4, 1948. Professor Snedecor will 
lecture on Statistical Experimental Design and will be available for statistical 
consultations. 

The newly formed Stastistical Laboratory at A.P.I. will also offer a course 
in Survey Sampling during the Spring Quarter to be taught by the Director, 
Professor T. A. Bancroft. Conferences in applied statistics for research workers 
in the lower southeastern states are being scheduled during the time of Pro¬ 
fessor Snedecor’s visit. 



130 


NEWS AND NOTICES 


New Members 

The following persons have been elected to membership in the Institute 
(September 1 to November 30, 1947) 

Afzal, M., M.A. (Panjab, India) Graduate student at Columbia Univ., tOS8 John Jay Hall , 
Columbia University , New York £7, New York. 

Billeter, Ernest P., Ph.D. (Univ. of Basle) Scientific Assistant (Statistical Office, Zurich) 
Turnerstrasse £3, Basle , Switzerland. 

Bishop, David James, M.Sc. (London) Head of Operational Research Section of British 
Iron and Steel Research Association, 11 Park Lane, London W. 1., England. 

Brooks, Hamilton, B.See (Univ. of Pittsburgh) Design Engineer, Westinghouse Electric 
Corp., P.O. Box 988 E. Pittsburgh , Pennsylvania. 

Craw, Alexander R., M.S. (Univ. of Notre Dame) Instructor in Math., U. S. Naval 
Academy, Annapolis, Maryland. 

Edwards, Daisy M., A.M. (Columbia Univ.) Lecturer in Statistics, University of 
London, Institute of Education, 7, Oakfield Court , Queens Road, Weybridge , Surrey , 
England . 

Havermark, K. Gunnar, Chief of Division, Royal Social Board, Lagerlojsg 8, Stockholm, 
Sweden. 

Hollingsworth, Charles A., Ph.D., (State Univ. of Iowa) Research Chemist, 504 Maple 
Ave., Waynesboro, Virginia. 

Hurd, Cuthbert C., Ph.D. (Univ. of Ill.) Plant Statistician, Carbide and Carbon Chemi¬ 
cals Corp., Oak Ridge, Tenn. 

Isaacson, Stanley L., M.A. (Johns Hopkins Univ.) Graduate student at Columbia 
Univ., 2528 Loyola Southway , Baltimore, Maryland. 

May, Kenneth, Ph.D., (Univ. of Calif.) Assistant Professor of Mathematics, Carleton 
College, Northfield, Minnesota. 

Mirsky, Robert, A.M. (Johns Hopkins Univ.) Graduate student at Columbia Univ., 
7 West 705th Street, Shanks Village , Orangeburg, New York. 

Mulhall, Harold, B.Sc. (Sydney) Lecturer in Mathematics, Department of Mathemat¬ 
ics, University of Sydney, Australia. 

Palm, Conny, Ph.D. (Stockholm) Docent, Ynglingar 11, Djursholm , Sweden. 

Pease, Katharine, A.M. (Smith College) Instructor in Psychology, Barnard College, 
Columbia University, New York 27, New York. 

Peckham, Cyril G., M.S. (Univ. of Ill.) Assistant Professor of Mathematics, University 
of Dayton, Dayton 9, Ohio. 

Peterson, Raymond P., Jr., B.A. (Univ. of Calif., Los Angeles) Assistant in Mathemat¬ 
ics, University of California, Los Angeles, Calif., 10729 Ashton Ave., Los Angeles 
24 , California. 

Pike, Eugene W. f Ph.D., (Princeton) Member McFarlan, Groth & Pike, 510 Audubon 
Ave., New York 88, New York. 

Pitman, Edwin J. G., M.A. (Univ. of Melbourne) Professor of Mathematics, Univ. of 
Tasmania, Hobart, Tasmania. 

Rigby, Fred D., Ph.D., (Univ. of Iowa) Mathematician, Office of Naval Research, P.O. 
Box 284, Foils Church , Virginia. 

Smith, Clarence DeWitt, Ph.D. (Univ. of Iowa) Associate Professor of Statistics, Box 
2686, University, Alabama. 

Srinivasan, T. K„ M.A. (Madras) Assistant Lecturer, Mathematics Department, 
Raja’s College, Pudukkottah, S-I-R, South India. 

Straubel, Morgan P., Quality Control Analyst, 4124 Ivanrest Road , Grandville, Michigan. 

Taylor, William F., A.B. (Univ. of Calif., Berkeley) Associate, School of Public Health, 
8042 Wheeler St., Berkeley , California. 



NEWS AND NOTICES 


131 


Trindade, Mario, Chief of the Statistical Division of the Instituto de Resseguros do Brazil, 
Rita Senador Soares 38, ap. SOI, Rio de Janeiro , Brazil. 

Von Schelling, Hermann, Ph.D. (Univ. of Berlin) Naval Medical Research Laborato¬ 
ry, U. S. Submarine Base, New London, Conn. 

Whidden, Phillips, A.B. (Harvard) Part-time Instructor in Mathematics, Carnegie Insti¬ 
tute of Technology, Pittsburgh 13, Pa. 

Wolman, William, B.B.A. (College of City of New York) Statistician, New York State 
Division of Housing, 295 Bark aide Avenue, Brooklyn 26, New York . 

Woodbury, Lowell A., Ph.D. (Univ. of Michigan) Assistant Professor of Physiology, 
Dept, of Physiology, University of Utah Medical School, Salt Lake City 1, Utah. 

\usuf, Mohammad, M.A. (Aligarh Muslim Univ., India) Graduate student at Columbia 
University, 208, Fumald Hall, Columbia University, New York 27, New York. 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 

The thirtieth meeting of the Institute of Mathematical Statistics was held in 
Berkeley, California on Monday and Tuesday, December 22 and 23, 1947. The 
meeting was attended by approximately 70 persons, including the following 31 
members of the Institute: 

G. A. Baker, G. G. Beckstead, B. M. Bennett, R. U. Bonner, Frances L. Campbell, E. L. 
Crow, Dorothy Cruden, W. J. Dixon, R. Dorfman, G. G. Eldredge, E. A. Fay, Evelyn Fix* 
M. A. Girshick, J. Gurland, T. E.Harris, W. L. Hart, J. L. Hodges, Jr., P. G. Hoel, H. M. 
Hughes, T. A. Jeeves, H. S. Konijn, G. M. Kuznets, E. L. Lehmann, R. B. Leipnik, J. Ney- 
man, Gladys Rappaport, II. Scheffg, T. W. Simpson, C. M. Stein, J. E. Walsh and H 
Working. 

The Monday morning program, with Professor J. Neyman presiding, consisted 
of the following contributed papers: 

1. The Performance Characteristic of Certain Methods for Obtaining Confidence Intervals. 

Mr. B. M. Bennett, University of California, Berkeley. 

2. Some Further Results on the Bernoulli Process. 

Dr. T. E. Harris, Douglas Aircraft Company. 

3. Most Powerful Tests of Composite Hypotheses I. Normal Distributions. 

Dr. E. L. Lehmann and Dr. C. M. Stein, University of California, Berkeley. 

4. On the Selection of Forecasting Formulas. 

Professor P. G. Hoel, University of California, Los Angeles. 

The Monday afternoon program, with Professor H. Schcff6 presiding, also 
consisted of contributed papers as follows: 

1. On the Power Function of the “Best” t-test Solution of the Behrens-Fisher Problem. 

Dr. J. E. Walsh, Douglas Aircraft Company. 

2. On Sequences of Experiments. 

Dr. C. M. Stein, University of California, Berkeley. 

3. The Effect of Selection above Definite Lower Limits of Linear Functions of Normally 

Distributed Correlated Variables on the Means and Variances of Other Linear Functions. 

Professor G. A. Baker, University of California, Davis. 

4. An Inversion Formula for the Distribution of a Ratio of Random Variables. 

Dr. J. Gurland, University of California, Berkeley. 

5. Independence of Parameters and Sufficient Statistics. 

Dr. E. W. Barankin, University of California, Berkeley. 

The Tuesday morning session, with Professor R. A. Gordon presiding, was 
devoted to the following invited and contributed papers on econometrics: 

1. Remarks on the Theory of Indices. 

Professor G. C. Evans, University of California, Berkeley. 

2. Interrelations of Theory and Statistical Research in Economics. 

Professor H. Working, Stanford University. 

3. Statistical and Case Methods in a Study of Labor Mobility. 

Professor D. McEntire, University of California, Berkeley. 

Discussion: Dr. M. Lipton, University of California, Berkeley. 

132 



REPORT ON NEW YORK MEETING 


133 


4. Distributions Associated with Continuous Stochastic Processes. 

Dr. R. B. Leipnik, University of California, Berkeley. 

5. On Some Methods of Evaluating Railway Costs. (By title) 

Miss Evelyn Fix, University of California, Berkeley. 

There was a dinner on Monday evening for members and guests at the Hotel 
Claremont and an informal discussion and coffee on Tuesday afternoon. 


REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 

The Tenth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Commodore Hotel, New York City, on December 28-30,1947. The 
meeting was held in conjunction with the American Statistical Association. 
The following 173 members of the Institute were in attendance: 

F. S. Acton, R. L. Anderson, H. K. Arnold, L. A. Aroian, M. Astrachan, H. M. Baldwin, 
W. D. Baten, R. E. Bechhofer, G. W. Beebe, M. H. Belz, A. A. Bennett, A. J. Berman, A. 
Blake, C. I. Bliss, P. Boschan, A. H. Bowker, A. E. Brandt, T. II. Brown,M. A. Brumbaugh, 
M. C. Bruyere, P. T. Bruyere, T. A. Budne, R. W. Burgess, R. S. Burington, B. H. Camp, 
G. C. Campbell, P. G. Carlson, Jr., U. Chand, H. Chernoff, Kai-Lai Chung, P. C. Clifford, 
W. G. Cochran, D. D. Cody, J. Cornfield, G. M. Cox, J. H. Curtiss, J. F. Daly, G. B. Dant- 
zig, D. G. Deihl, H. F. Dorn, A. J. Duncan, C. W. Dunnett, D. Durand, J. Dutka, P. S. 
Dwyer, G. L. Edgett, C. Eisenhart, B. Epstein, M. W. Eudey, W. D. Evans, Will Feller, 

C. D. Ferris, C. B. Fine, M. M. Flood, L. R. Frankel, J. E. Freund, B. Friedman, Hilda 
Geiringer, M. A. Geisler, if. H. Germond, M. A. Girshick, Abraham Golub, C.H. Graves, 
S. W. Greenhouse, J. A. Greenwood, T. N. E. Greville, J. I. Griffin, E. T. Gumbel, M. 
Gurney, K. W. Halbert, Max Halperin, M. H. Ilansen, T. E. Harris, B. Harshbarger, 
Alex Hart, P. M. Hauser, J. D. Heide, L. II. Herbach, M. W. Ilirsch, Harold Hotelling, H. 
M. Humes, C. C. Hurd, S. Jablon, C. M. Jaeger, A. S. Kaitz, Leo Katz, T. L. Kelley, L. S. 
Kellogg, L. F. Knudsen, A. K. Kury, Jack Laderman, M. LeLeika, Joseph Lev, Howard 
Levene, J. E. Lieberman, Julius Lieblem, S. B. Littauer, Eugene Lukacs, Geo. A. Lundberg, 
J. C. McPherson, Benjamin Malzberg, Sophie Marcuse, E. S. Marks, H. C. Mathisen, J. 
W. Mauchly, A. L. Mayerson, Margaret Mcrrell, E. B. Mode, E. C. Molina, M. E. Moore, 

D. J. Morrow, J. E. Morton, Jack Moshman, Hugo Muench, D. N. Nanda, M. G. Natrella, 
Doris Newman, G. E. Nicholson, Jr., Harold Nisselson, Nilan Norris, H. W. Norton, P. S. 
Olmstead, A. L. O’Toole, A. E. Pauli, C. N. Payne, Katherine Pease, M. P. Peisakoff, E. 
W. Pike, 0. A. Pope, G. B. Price, L. J. Reed, J. S. Rhodes, S. F. Robinson, A. C. Rosander, 
Ernest Rubin, P. J. Rulon, Rose Sachs, Frank Saidel, Arthur Sard, M. M. Sandomire, F. 

E. Satterthwaite, E. D. Schell, Bernice Scherl, O. N. Serbein, R. G. Seth, Harry Shulman, 
Rosedith Sitgreaves, C. DeW. Smith, G. W. Snedecor, Herbert Solomon, D. E. South, 
Arthur Stein, G. T. Steinberg, Joseph Steinberg, A. I. Sternhell, S. A. Stouffer, J. V. Stur- 
tevant, B. R. Suydam, W. R. Thompson, Gerhard Tintner, J. W. Tukey, D. F. Votaw, Jr., 
A. J. Wadman, H. M. Walker, Dzung-shu Wei, Sidney Weiner, Samuel Weiss, Sophie R. 
Wilkey, R. I. Wilkinson, S. S. Wilks, C. P. Winsor, Jacob Wolfowitz, W. J. Youden. 

The first session, a joint session with the American Statistical Society, was 
held on the morning of December 28 and was devoted to the topic The Teaching 
of Statistics. Professor W. G. Cochran of North Carolina State College presided. 
A paper entitled Three Recent Reports Dealing with the Teaching of Statistics , 



134 


REPORT ON NEW YORK MEETING 


the Training of Statisticians and the Crisis in Statistical Personnel was presented 
by Dr. James D. Paris of the Metropolitan Life Insurance Company. Many 
members participated in the general discussion which followed. 

The second session on The Teaching of Statistics also with the American Sta¬ 
tistical Association, was held at 1:15 P.M. Professor Francis G. Cornell of the 
University of Illinois was chairman. The main paper of the session was the 
paper by Professor George W. Snedecor of Iowa State College entitled Syllabus 
for a Proposed Course in Basic Statistics . This was followed by prepared dis¬ 
cussion by: professors Elmer B. Mode, Boston University; Helen M. Walker, 
Teachers College, Columbia University; Samuel A. Stouffer, Harvard Uni¬ 
versity; and Albert E. Waugh, Department of Economics, University of Con¬ 
necticut. Many members participated in the general discussion. At the 
conclusion of this session, a film on Modem Quality Control was shown by Mr. 
Simon Collier of the Johns Manville Company. 

Two Monday sessions, also held jointly with the American Statistical As¬ 
sociation, and with the cooperation of the Operations Evaluation Group of the 
Navy and the Operations Analysis of the Air Force, were devoted to Operations 
Research. Professor Edward L. Bowles of Massachusetts Institute Of Tech¬ 
nology presided at the Morning session. The following papers: 

1. Operations Research in the Department of the Navy . 

Dr. J. Steinhardt, Director, Operations Evaluation Group. 

2. Operations Research in the Department of the Air Forces . 

Dr. Leroy A. Brothers, Chief, Operations Analysis. 

were followed by discussion by Dr. Arthur A. Brown, Operations Evaluation 
Group, Dr. Thomas I. Edwards, Operations Analysis, Professor G. Baley Price, 
The University of Kansas and Wartime Operations Analyst and Dr. W. J. 
Youden, Douglas Aircraft Company and Wartime Operations Analyst. 

Dr. Merrill M. Flood, Assistant Deputy Director of Research and Develop¬ 
ment, General Staff, U. S. Army, presided at the afternoon session. The fol¬ 
lowing papers were presented: 

1. Operations Analysis in the Southwest Pacific Air War. 

Dr. Roger I. Wilkinson, Bell Telephone Laboratories and Wartime Operations Ana¬ 
lyst. 

2. Operations Analysis of Air-Sea Rescue . 

Dr. E. S. Lamar, Operations Evaluation Group. 

3. Factorial Chi-Square in Test Shooting. 

Dr. A. E. Brandt, Technical Director, Naval Ordnance Laboratory and Wartime 

Operations Analyst. 

4. Mathematical Techniques of Program Planning. 

Dr. George Dantzig, Consultant to the Air Comptroller, Headquarters, USAF. 

A session on the Application of the Theory of Extreme Values was held jointly 
with the American Statistical Association on Tuesday, December 30. Professor 
Jacob Wolfowitz of Columbia University presided at the session. The following 
papers were presented: 



REPORT ON NEW YORK MEETING 


135 


1. Introduction: The Mathematical Theory of Extreme Values . 

Professor Richard Von Mises, Harvard University. 

2. Applications to the Prediction of Flood Flows. 

Professor Emil Gumbel, Brooklyn College. 

3. Applications to Meteorology. 

Dr. Horace Norton, Weather Bureau, Washington, D. C. 

4. Applications to Fracture Problems. 

Dr. Benjamin Epstein, Coal Research Laboratory, Carnegie Institute of Technology. 

The session concluded with discussion by Miss Marion Sandomire, Navy Depart¬ 
ment, Bureau of Ships and Dr. Bradford Kimball, Port Washington, New York. 

A session on Statistical Techniques in Life Insurance was held jointly with 
the American Statistical Association at 1:15 P.M., December 30. Mr. Robert 
J. Myers, Actuarial Consultant, Social Security Administration, was chairman 
of the meeting. The following papers were presented: 

1. Pioblems with Sampling Procedures for Reserve Valuations. 

Mr. George C. Campbell, Supervisor, Actuarial Division, Metropolitan Life Insurance 

Company. 

2. Sampling Errors in Life Insurance Mortality and Other Statistics. 

Mr. Donald Cody, Assistant Actuary, Equitable Life Assurance Society. 

3. Recent Developments in Graduation and Interpolation. 

Dr. T. N. E. Greville, National Office of Vital Statistics, U. S. Public Health Service. 

A session of contributed papers was held at 3:30 P.M. on December 30. Dr. 
T. N. E. Greville of the National Office of Vital Statistics presided. The fol¬ 
lowing papers were presented: 

1. Distribution of the Circular Serial Correlation Coefficient for Residuals from a Fitted 

Fourier Series. (Preliminary Report.) 

Professor R. L. Anderson, North Carolina State College and Professor T. W. Ander¬ 
son, Jr., Columbia University. 

2. Some New Methods for Distributions of Quadratic Forms. 

Professor Harold Hotelling, Institute of Statistics, University of North Carolina. 

3. Frequency Functions Defined by the Pearson Difference Equation. 

Professor Leo Katz, Michigan State College, East Lansing. 

4. Distribution of the Sum of Roots of a Determinantal Equation Under a Certain Condition. 

Mr. D. N. Nanda, Institute of Statistics, University of North Carolina. 

5. Applications of Carnap's Probability Theory lo Statistical Inference. 

Professor Gerhard Tintner, Department of Economics, Iowa State College. 

6. Circular Probable Error of an Elliptical Gaussian Distribution. 

Dr. H. H. Germond, S. W. Marshall <fe Co., Washington, D. C. 

The annual business meeting of the Institute was held at 4:30 P.M., December 
29, 1947 in the ball room of the Commodore Hotel. There were reports by the* 
President, Secretary-Treasurer, Mr. Morris Hansen, Chairman of the Com¬ 
mittee on Planning and Development, and Dr. John Curtiss, Chairman of the 
Program Committee. Mr. Hansen presented a tentative form of the proposed 
new constitution while Dr. Curtiss discussed program plans. There was some 
discussion on these general questions from the floor. 



136 


REPORT ON THE CHICAGO MEETING 


Professor A. Wald was elected President, 
Professor Hemy Scheflfe, Vice-Presidents. 


and Dr. Churchill Eisenhart and 

Paul S. Dwyer, 
Secretary . 


REPORT ON THE CHICAGO MEETING OF THE INSTITUTE 

The thirty-second meeting of the Institute of Mathematical Statistics was 
held at the Sherman Hotel, Chicago, Monday and Tuesday, December 29-30. 
The meeting was held in conjunction with the one hundred fourteenth meeting 
of the American Association for the Advancement of Science and Co-operating 
Associated Societies. The following twenty-eight members of the Institute 
attended the meeting: 

W. Bartky, D. H. Blackwell, G. M. Brown, I. W. Burr, A. G. Carlton, M. Castellanos, 
C. W. Cotterman, A. T. Craig, J. H. Davidson, R. C. Davis, W. E. Deming, M. Elveback, 
M. L. Garbuny, W. W. Gutzman, T. J. Jaramillo, E. S. Keeping, T. C. Koopmans, E. L. 
Lahti, M. M. Lavin, K. May, J. A. Pierce, 0. Reiersol, H. Rubin, L. J. Savage, J. Silber, 
W. A. Wallis, E. L. Welker and J. W. Wilkins. 

The Monday afternoon session was devoted to contributed papers of Section A, 
AAAS, and of the Institute, and to the Vice-Presidential address of Section A. 
The following papers were presented: 

1. On the Boundary Layer Motion along a Periodically Oscillating Plane in Compressible 
Viscous Fluids. 

Dr. M. Z. Krzywoblocki, University of Illinois. 

2. Variations of the Probability of Unfair Election Results. 

Dr. Kenneth May, Carleton College. 

3. Normal Equations with Nearly Vanishing Determinants. 

Dr. M. Herzberger and Dr. R. Norris. 

4. Composition of Binary Quadratic Forms. 

Professor Gordon Pall, Illinois Institute of Technology. 

5. A Proof of the Asymptotic Analogue of the Theorem of Cramhr and Rao. 

Dr. Herman Rubin, Institute for Advanced Study. 

6. The Solution of Differential Equations in the Presence of Turning Points , Vice-Presi¬ 
dential address of Section A. 

The Tuesday afternoon session was also a joint session of Section A and the 
Institute, with Dean Walter Bartky of the University of Chicago presiding. 
The following two papers were presented upon invitation of the Institute: 

1. Application of the Radon-Nikodym Theorem to the Theory of Sufficient Statistics. 
Professor P. R. Halmos and Dr. L. J. Savage, University of Chicago. 

2. Unbiased Sequential Estimation . 

ProfessorDavid Blackwell, Howard University. 



REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1947 

The healthy growth of the Institute has continued through 1947. The 
membership increased from 900 to 1046. This increase is gratifying as a sign 
that more and more people appreciate the usefulness of basic theory and are 
ready to support research by making our Annals possible. It is is also pleasing 
to note that statistical theory and methodology are reaching new fields and 
that new groups as a whole are becoming conscious of the usefulness of contact 
with mathematical statistics. These developments are reflected in the meetings 
of the Institute. 

Meetings . The Ninth and Tenth Annual meetings (for 1946 and 1947) were 
held in the traditional way in conjunction with the meetings of the American 
Statistical Association (January—Atlantic City and Christmas—New York). 
The Tenth Summer Meeting was held with the American Mathematical Society 
and the Mathematical Association of America (September—Yale). Regional 
meetings were held in California (June—San Diego, December—Berkeley) and 
in Chicago (December), the latter in conjunction with the meetings of the 
American Association for the Advancement of Science (AAAS). Moreover, 
two meetings were organized with specialized programs of interest to groups 
with whom the Institute has not previously had much contact. A meeting 
in April at Columbia University, co-sponsored by the American Mathematical 
Society, was devoted to Stochastic Processes and Random Noise , and another 
meeting held simultaneously at Atlantic City was in conjunction with the meeting 
of the Eastern Psychological Association. It is clear that with such diversified 
meetings the Program Committee could not always act as a unit. J. H. Curtiss 
was its Chairman and J. Neyman and J. W. Tukey arranged some of the pro¬ 
grams. Other members of the Committee were: C. W. Churchman, T. 
Koopmans, F. C. Mosteller, J. Neyman, H. Scheflte, J. Wolfowitz, and H. 
Working. 

At the Tenth Summer Meeting A. Wald delivered the first Henry L. Rietz 
Memorial Lecture. It is desirable to preserve the solemnity of the occasion 
of the Rietz lectures and it was therefore decided that they should not be given 
every year. Accordingly, no Rietz lecturer has been selected for 1948. 

The Institute had no share in the program of the International Statistical 
Congress in Washington. However, Fellows of the Institute were invited to 
that Congress. This Congress and the Princeton Bi-Centennial were beneficial 
by establishing more intimate personal ties with our European colleagues. It is 
widely felt on both sides of the ocean that a closer cooperation, in particular 
with British statisticians, is highly desirable. Various suggestions in that 
direction were informally discussed in Washington and Princeton and M. G. 
Kendall has kindly consented to explore the practical possibilities. It is needless 
to say that the Institute is eager to do everything possible to promote cooperation 
and increase its usefulness also to our British colleagues. 

137 



138 


REPORT OP THE PRESIDENT 


Relations with other organizations. It is gratifying to note that the cooperation 
of the Institute with sister societies is growing in intensity. The last two Presi¬ 
dential reports mentioned plans for a reorganization of the American Statistical 
Association with a view to more intimate relations among statistical societies. 
The revision of the constitution of the Association is not yet completed. It ap¬ 
pears now that also the American Mathematical Society feels the need of closer co- 
laboration with all groups interested in applied mathematics. It is too early to 
predict the results of these movements but it is clear that we must devote careful 
thought to our own organization and to our future relations with other groups. 

In 1947 the AAAS organized an Inter-Society Committee for the National 
Science Foundation Legislation. At the first meeting in Washington we were 
represented by J. H. Curtiss and W. A. Shewhart and at the meeting in December 
in Chicago by W. Bartky. In ballots on the two controversial subjects the 
Institute voted against exclusion of social sciences and abstained on the question 
of patent rights. W. Feller represented the Institute on the Policy Committee 
of the American Mathematical Society. Through this Committee the Institute 
went on record as favoring the National Science Foundation Bill. Otherwise 
the discussions of the Policy Committee were mostly connected with the es¬ 
tablishment of an International Mathematical Union. Cletus O. Oakley rep¬ 
resented the Institute on the Publicity Committee of the American Mathematical 
Society of which he is chairman. G. W. Snedecor was our representative on the 
AAAS Council, W. Bartky on the National Research Council, F. C. Mosteller 
and S. S. Wilks on the Joint Committee for the Development of Statistical 
Application in Engineering and Manufacturing. In recent years the common 
interests of the Institute and the actuarial profession have grown in importance 
and it has been suggested that closer cooperation would be beneficial to both 
parts. A new committee has been established to explore these possibilities 
and in particular to arrange a joint meeting during 1948. Members of this 
committee are: G. C. Campbell, T. N. E. Greville, C. Fisher, C. Spoerl, 
Chairman. 

Internal Work . The growth of the Institute has rendered parts of the Con¬ 
stitution obsolete and a revision seems indicated. In particular, it appears that 
the present system of elections is no longer satisfactory. The Institute is deeply 
indebted to its Committee on Planning and Development which has devoted 
much thought and consideration not only to a revision of the Constitution but 
also to the future development of the Institute as a whole. The membership 
had occasion to discuss the preliminary plans at two business meetings. M. H. 
Hansen acted as Chairman of the Committee; other members were: J. H. Curtiss, 
W. G. Cochran, J. Neyman, H. W. Norton, F. F. Stephan, J. W. Tukey, W. A. 
Wallis. 

A sharp increase in printing costs has, unfortunately, necessitated an increase 
in membership dues. However, the membership should rest assured that the 
financial position of the Institute is intrinsically sound. The cash prospects 
for 1948 are not rosy, but this is due principally to the necessity of reprinting 



REPORT OP THE PRESIDENT 


139 


back-numbers of the Annals which in itself is a sign of health and promise of 
stability. At present the Institute has a considerable reserve in back numbers 
and this reserve is rapidly being transformed into cash. We are also exploring 
the possibilities of new revenue and have started a campaign to get advertise¬ 
ments for the Annals. A possible campaign for institutional members is held 
in abeyance pending a clarification of our formal relations with sister societies. 
In order to make the Annals available in European countries with monetary 
exchange restrictions, the dues and subscriptions have been increased only for 
the Western Hemisphere. The investments of the Institute have been super¬ 
vised by the Finance Committee consisting of C. F. Roos, L. A. Knowler, F. F. 
Stephan, and Paul S. Dwyer, Chairman. 

Last year’s Committee on Teaching completed its work and submitted a 
detailed report which will be of great value. It will be published in the Annals 
of Mathematical Statistics. The Committee has been dissolved with special 
thanks of the Board of Directors for their successful work. H. Hotelling was 
chairman and its members were Walter Bartky, W. Edwards Deming, Milton 
Friedman, and Paul Hoel. The Committee on Tabulation under the chairman¬ 
ship of C. Eisenhart and consisting of Paul S. Dwyer, H. Goldstine, A. Lowan, 
H. W. Norton, and G. R. Stibitz has outlined the work for the coming years 
which promises to be of great interest. 

The Membership Committee consisted of C. C. Craig, P. G. Hoel, and J. H. 
Curtiss as Chairman. On its recommendations the following members were 
elected Fellows: T. W. Anderson, David Blackwell, Frederick Mosteller, Gerhard 
Tintner, Charles P. Winsor, Alexander Aitken, George Darmois, Ragnar Frisch, 
Robert C. Geary, and John Wishart. The Nominating Committee consisted of 
Meyer A. Girshick, Paul G. Hoel, Horace W. Norton, Frederick Mosteller, 
and George W. Snedecor, Chairman. A. Wald was nominated for President, 
and as an innovation four nominations for Vice-presidents were made: C. 
Eisenhart, A. M. Mood, Henry Scheffe, F. F. Stephan. 

The Annals of Mathematical Statistics are covered by a special report of the 
Editor. However, it is appropriate to say that the Institute takes pride in the 
development of the Annals . While members see only its spectacular Success, 
they should bear in mind that this is mostly due to the work of one man, S. S. 
Wilks. In view of the great variety of interests of our membership and the 
many desirable directions in which the Annals could develop, it is clear that 
the work of the Editor can not always be pleasing and naturally often means a 
nervous burden. I feel sure that I speak for all our members in expressing the 
Institute’s sincere thanks to S. S. Wilks not only for his work but also for his 
wisdom in striking a sensible balance between many wishes and possibilities 
and leading the Annals so successfully in a direction satisfactory to all of us. 

In thanking all other members who have contributed to the work of the Insti¬ 
tute, it is hard to find appropriate words to express appreciation for the un¬ 
selfish efforts and devotion of our Secretary-Treasurer. Few members will 
realize how much of Dwyer’s time and thoughts are spent for the Institute 



140 


REPORT OF THE PRESIDENT 


and how much the smooth running of the affairs of the Institute is due to his 
hard work. 

Finally, it is a pleasant duty to express our thanks and appreciation to Prince¬ 
ton University and to the University of Michigan. These Institutions have 
generously provided office space and other help which has greatly facilitated 
our work and saved us expenses. 

Will Feller, 
President , 1947. 

December 31, 1947. 



REPORT OF THE SECRETARY-TREASURER OF THE 
INSTITUTE FOR 1947 

At the beginning of 1947 the Institute had 900 members and during 1947, 
210 new members (10 of which begin their membership with 1948) joined the 
Institute. During 1947 the Institute lost 73 members, 43 by resignation, 25 
by suspension for non-payment of dues, and 5 by death. The Institute has 
1,037 members as it starts 1948. 

The following members died during the year: 

Margaret J Dix 
Professor Irving Fisher 
Albert M Freeman 
Professor Henry A Ruger 
Professor James G Smith 

A summary of the financial transactions of the Institute is given in the Fi¬ 
nancial Statement for 1947 which follows: 


FINANCIAL STATEMENT 
December 31, 1946 to December 31, 1947 
A Receipts 


Balance, on Hand,* December 31, 1946 

Dues 

Life Membership Payments 

Subscriptions 

Sale of Back Numbers 

Net Income from Investments 

Miscellaneous 


$7,241.55 
5,054 43 
287 50 
2,892 93 
3,969 95 
63 00 
76 56 

Total 


$19,585 92 

B Expenditures 



Annals—Current 

Office of Editor 

Waverly Press 

$160 40 

7 145 79 

$7,306 19 

Annals—Back Numbers 

Reprinted 500 copies each Vol III $1 & 2, IV ^2, V #2, VII 
#4, XI # 1 & 4; XII #1,XIV #1,2&3 

Iowa City Office 

3,039 00 
143 75 

3,182.75 

Mathematical Reviews and Inter Society for National Sci¬ 
ence Foundation 


135.00 


In bank deposits and government bonds. 

141 



142 


REPORT OP SECRETARY-TREASURER 


Office of the Secretary-Treasurer 


Printing, memoranda, etc (including some stamped envelopes) 
Postage, supplies, express, telephone calls and cables 

Clerical help 

1,100 49 
400 00 
1,502 31 

3,002 80 

Miscellaneous 

Balance on Hand,* December 31, 1947 


100 81 
5,858.37 

Total 


$19,585 92 

C Summary of Receipts and Expenditures 

Balance on Hand,* December 31, 1946 

Receipts during 1947 

Expenditures during 1947 

Balance on Hand,* December 31, 1947 


$7,241 55 
12,344 37 
13,727 55 
5,858 37 


D Comparison of Assets on December 31, 1946 and December 31, 1947 



'46 

•47 

U. S Government G Bonds 

$5,000 00 

$3,000 00 

Life Membership Funds 

1 888 00 

1,888 00—Bonds 


139 50 

427 00—Bank Dep. 

Additional Bank Deposits 

214 05 

543 37 

Current Accounts Receivable 

452 62 

423 55 

Estimated Value (Cost) of back issues of Annals ** 

7 234 58 

10,866 73 

Total 

$14,928 75 

$17,148 65 

Net Gam 1947 


2,219 90 


E Liabilities of Institute of Mathematical Statistics as of December 31, 1947 

All bills which have been presented have been paid The Life Membership Fund now 
contains $2,315 00 which covers 30 members Also $3,348 11 has been paid m for 1948 
(and later) dues and subscriptions 

The increase in the size of the Annals from 500 to 600 pages and the phe¬ 
nomenal activity in the sales of back numbers are the two most important factors 
to be considered m comparmg the 1947 statement with those of previous years. 
The Waverly Press bills for 1946 totalled $4,566 27 while the corresponding 
amount for 1947 was $7,145 79 an increase of 56% The mcrease is attributable 
not only to the increased size of the Annals but also to the fact that printing 
costs are rising rapidly and, to a less extent, to the fact that we are printing a 
larger number of copies It is to be noted that the cost of the Annals alone in 
1947 was over $2,000 more than the amount received from dues. As a result 
of the mcrease in dues, the 1948 report should be more satisfactory in this respect. 

The phenomenal sales in back issues, noted in the report for 1946, were ac¬ 
celerated in 1947. We sold nearly $4,000 of back issues. These extensive 
sales were embarrassing to our cash position since they exhausted many of our 
issues and the continued reprinting forced us to place a considerable portion of 


1 Cost of Annals calculated at 67 cents per copy. 



REPORT OP SECRETARY-TREASURER 


143 


our reserves in inventory (some of which probably will not be returned to cash 
within decades). Eleven issues were reprinted during the first six months of 
1947. The resulting low cash position forced a temporary change in the policy 
of reprinting issues as they became exhausted. 

It was necessary to cash two $1000 interest bearing G bonds to meet the 
Waverly and reprinting bills as they came due. These brought $1938.00 rather 
than $2000 as they have been valued in previous reports. As the income from 
bonds during the year was $125,1 have entered the net income from investments 
as $63.00. 

An attempt has been made to keep down the costs of the office of the Secretary- 
Treasurer. The expense for 1947 was about $100 more than the expense for 
1946 and seems very satisfactory in view of the larger membership and greatly 
increased costs of all materials and services. 

For the reasons indicated above, the cash position (including bonds and Life 
Membership payments) was lowered during the year by $1,383.18. This is 
compensated for by an increase in the value of the stock of back issues (valued 
at cost) of $3,632.15. Some members of the finance committee feel that it is 
improper to list all of this stock as assets since we can probably sell only a portion 
of it in the next five or ten years. However, we did sell nearly $4,000 of Annals 
in 1947 and it is indicated (at the new prices) that the sales of issues we have 
now on hand will yield us $11,000 in the next five to ten years. 

Many of the issues which were stored in Iowa City have been sold and Pro¬ 
fessor Knowler has sent the remaining issues to Ann Arbor. I wish to acknowl¬ 
edge the work of Professor Knowler in caring for these issues and to express the 
appreciation of the Institute for his efforts over a period of years. I also wish 
to express my appreciation to Mr. Carl Bennett who contributed much time 
and energy in looking after the back issues at Ann Arbor. 

This report does not cover the amount of $390.20 which is held temporarily 
by the Institute for the fund for Annals for Countries Devastated by War. 
Arrangements are being made to purchase Annals for certain institutions which 
the Committee is recommending. 

Paul S. Dwyer, 
Secretary-Treasurer. 

December 31, 1947. 



REPORT OF THE EDITOR FOR 1947 

During the past year the increase in the number of manuscripts submitted 
to the Annals has continued. More manuscripts have been received from 
foreign countries than in any preceding year. During 1947 papers were pub¬ 
lished by authors in Argentina, Australia, Canada, England, France and Sweden. 
If manuscripts continue to be received at the present rate it will not be possible 
to publish them in the Annals without further expansion. The gap between 
receipts of manuscripts and publication is likely to become serious by the end 
of 1948. The 1947 volume of the Annals contained 56 papers of which 25 were 
short notes. The total number of pages printed was 618, representing an 
increase of approximately 11% over the size of the 1946 volume. It now appears 
that increased printing costs will prevent a further increase in the size of the 
Annals for 1948. It is therefore extremely important that authors submitting 
papers to the Annals make every effort to keep their papers as brief as possible. 

Contributions to probability and statistical theory are continuing to come 
in from a wide variety of fields. They were written by biologists, chemists, 
economists, mathematical statisticians, mathematicians and physicists, rep¬ 
resenting universities, government agencies and laboratories, business and 
industrial organizations. Some of these contributions are rather heterogeneous 
in quality of results and presentation. However, patient attempts are being 
made to have all papers with novel and interesting results suitably revised and 
published. Attempts to have expository papers prepared are being continued. 

The Editor wishes to take this opportunity to acknowledge, on behalf of the 
Editorial Committee, the generous refereeing assistance which has been given 
by the following persons: L. A. Aroian, Z. W. Bimbaum, David Blackwell, 
A. H. Bowker, I. W. Burr, G. W. Brown, K. L. Chung, W. J. Dixon, T. N. E. 
Greville, F. E. Grubbs, J. B. S. Haldane, T. E. Harris, C. Hastings, L. Henkin, 

G. A. Hunt, B. F. Kimball, T. Koopmans, S. Kullback, E. L. Lehmann, H. 
Levene, H. B. Mann, P. J. McCarthy, W. E. Milne, R. Otter, M. P. Peisakoff, 

H. E. Robbins, L. J. Savage, F. F. Stephan, D. F. Votaw, and J. E. Walsh. 
The Editor is also indebted to the following persons at Princeton University 

for preparation of manuscripts for the printer, and other editorial and office 
assistance: Miss Jacqueline G. Foster, M. F. Freeman and J. E. Walsh. 

S. S. Wilks, 
Editor . 

December 31, 1947. 


144 



CONSTITUTION AND BY-LAWS 
OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 


Constitution 
ARTICLE I 
Name and Purpose 

1. This organisation shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
members excepted, who have been members for twenty-three months prior to the date 
of voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term 
as determined by the Committee on Membership and approved by the Board of Directors. 

ARTICLE III 

Officers, Board of Directors, and Committee on Membership 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one 
year and that of the Secretary-Treasurer three years. Elections shall be by majority 
ballots at Annual Meetings of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows. At their first meeting subsequent to the adoption of this Constitution, the 
Board of Directors shall elect three members as Fellows to serve as the Committee on 
Membership, one member of the Committee for a term of one year, another for a term 
of two years, and another for a term of three years. Thereafter the Board of Directors 
shall elect from among the Fellows one member annually at their first meeting after their 
election for a term of three years. The president shall designate one of the Vice-Presi¬ 
dents as Chairman of this Committee. 

ARTICLE IV 
Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 

145 



146 


INSTITUTE OF MATHEMATICAL STATISTICS 


time as the Board of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall 
be given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President 
may be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board may 
be held from time to time at the call of the President or any two members of the Board. 
Notice of each meeting of the Board, other than the two regular meetings, together with 
a statement of the business to be brought before the meeting, must be given to the mem¬ 
bers of the board by the Secretary-Treasurer at least five days prior to the date set there¬ 
for. Should other business be passed upon, any member of the Board shall have the 
right to reopen the question at the next meeting. 

3. Meetings of the Committee on Membership may be held from time to time at the 
call of the Chairman or any member of the Committee provided notice of such call and 
the purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the right to reopen the question at the 
next meeting. Committee business may also \>e transacted by correspondence if that 
seems preferable. 

4. At a regularly convened meeting of the Board of Directors, four members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
The Editor of the Annate of Mathematical Statistics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated 
at the discretion of the Board of Directors. 

2. Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion ob Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
dayB before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 



BY-LAWS 


145P 


By-laws 

ARTICLE I 

Duties of the Officers, the Editor, Board of Directors, and 
Committee on Membership 

1. The President, or in his absence, one of the Vice-Presidents, or in the*.absence of 
the President and both Vice-Presidents, a Fellow selected by vote of the Fellows present 
shall preside at the meetings of the Institute and of the Board of Directors.' r At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, hut at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nominations 
may be submitted in writing, if signed by at least ten Fellows of the Institute, up to the 
time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings at 
the meetings of the Institute and of the Board of Directors, send out calls for said meet¬ 
ings and, with the approval of the President and the Board, carry on the correspondence 
of the Institute. Subject to the direction of the Board, he shall have charge of the ar¬ 
chives and other tangible and intangible property of the Institute and upon the direction 
of the Board he shall publish in the Annals of Mathematical Statistics a classified list of all 
Members and Fellows of the Institute. He shall send out calls for annual dues and ac¬ 
knowledge receipt of same; pay all bills approved by the President for expenditures 
authorized by the Board or the Institute; keep a detailed account of all receipts and ex¬ 
penditures, prepare a financial statement at the end of each year and present an abstract 
of the same at the annual meeting of the Institute after it has been audited by a Member 
or Fellow of the Institute appointed by the President as Auditor. The Auditors shall 
report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi¬ 
bility for all editorial matters concerning the editing of the Annals cf Mathematical Sta¬ 
tistics. He shall, with the advice and consent of the Board, appoint an Editorial Com¬ 
mittee of not less than twelve members to co-operate with him; four for a period of five 
years, four for a period of three years, and the remaining members for a period of two 
years, appointments to be made annually as needed. All appointments to the Editorial 
Committee shall terminate with the appointment of a new Editor. The Editor shall 
serve as editorial adviser in the publication of all scientific monographs and pamphlets 
authorized by the Board. 

4. The Board of Directors shall have charge of the funds and of the affairs of the In¬ 
stitute, with the exception of those affairs specifically assigned to the President or to the 
Committee on Membership. The Board shall have authority to fill all vacancies ad 
interim, occurring among the Officers, Board of Directors, or in any of the Committee^. 
The Board may appoint such other committees as may be required from time to time to 
carry on the affairs of the Institute. The power of election to the different grades of 
Membership, except the grades of Member and Junior Member, shall reside in the Board. 

5. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 



148 


INSTITUTE OF MATHEMATICAL STATISTICS 


different grades of membership. The Committee shall review these qualifications periodi¬ 
cally and shall make such changes in these qualifications and make such recommendations 
with reference to the number of grades of membership as it deems advisable. The power 
to elect worthy applicants to the grades of Member and Junior Member shall reside in 
the Committee, which may delegate this power to the Secretary-Treasurer, subject to 
such reservations as the Committee considers appropriate. The Committee shall make 
recommendations to the Board of Directors with reference to placing members in other 
grades of membership. The Committee shall give its attention to the question of in¬ 
creasing the number of applicants for membership and shall advise the Secretary-Trea¬ 
surer on plans for that purpose. 


ARTICLE II 
Dues 

1. Members shall pay seven dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members and Fellow's 
shall pay seven dollars annual dues. Honorary members shall be exempt from all dues. 

A Sustaining Member shall pay annual dues of a multiple of one hundred dollars. 

An approved nominee of a Sustaining Member shall be a member in good standing 
without payment of dues for each year in which he is nominated provided that in that 
year he has been a member for less than three years. 

(a) Exception. In the case that two Members of the Institute are husband and wife 
and they elect to receive between them only one copy of the Official Journal, their dues 
shall each be reduced by twenty-five per cent. 

(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding annual dues and which will not other¬ 
wise alter his status as a Member or Fellow and will be based upon a suitable table and 
rate of interest, to be specified by the Board of Directors. 

(c) Exception. Any Member of Fellow of the Institute serving, except as a commis¬ 
sioned officer, in the Armed Forces of the United States, or of a friendly power, will, upon 
notification to the Secretary-Treasurer, be excused from the payment of dues until the 
January first following his discharge from service or his commissioning as an officer. He 
shall have all privileges of membership except that he shall not receive the Official Journal. 
However, during the first year of his resumed membership he may elect to receive one 
copy of each volume of the Official Journal published during the period of his service 
membership by paying one-half of the total of dues excused. 

(d) Exception. Anyone who resides outside the Western Hemisphere shall pay five 
dollars annual dues. 

2. Annual dues shall be payable on the first day of January of each year. 

3. Five dollars of the annual dues of each Member and Fellow shall be for a subscrip¬ 
tion to the Official Journal. Fifteen dollars of the dues of each Sustaining Member shall 
be for two subscriptions to the Official Journal, and the binding of one copy. 

*4. For each one hundred dollars of annual dues, a Sustaining Member shall be entitled 
to nominate two persons for membership in the Institute. 

5. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may be six months in arrears, and to accompany such a notice by a copy of this article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent to the Board of Directors. 



BY-LAWS 


149 


The Board of Directors may strike the delinquent’s name from the rolls and withdraw 
all privileges of membership, and may reinstate the delinquent upon payment of arrears 
of dues. 


ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 




DISCRIMINANT FUNCTIONS WITH COVARIANCE 

By W. G. Cochran and C. I. Buss 

North Carolina State College; Connecticut Agricultural Experiment Station and 

Yale University 

1. Summary. This paper discusses the extension of the discriminant func¬ 
tion to the case where certain variates (called the covariance variates) are known 
to have the same means in all populations. Although such variates have no 
discriminating power by themselves, they may still be utilized in the discriminant 
function. 

The first step is to adjust the discriminators by means of their ‘within-sample’ 
regressions on the covariance variates. The discriminant function is then 
calculated in the usual way from these adjusted variates. The standard tests of 
significance for the discriminant function (e.g. Hotelling’s T 2 test) can be ex¬ 
tended to this case without difficulty. A measure is suggested of the gain in 
information due to covariance and the computations are illustrated by a numeri¬ 
cal example. The discussion is confined to the case where only a single function 
of the population means is being investigated. 

2. Introduction. Discriminant function analysis is now fairly well advanced 
for the case where there are only two populations. The data consist of a number 
of measurements, called the discriminators , that have been made on each member 
of a random sample from each population. The technique has various uses. 
Fisher [1] used it in seeking a linear function of the measurements that could be 
employed to classify new observations into one or other of the two populations. 
He pointed out [2] that a test of significance of the difference between the two 
samples, developed from his discriminant, was identical with Hotelling’s generali¬ 
zation of Student’s t test, discovered some years earlier [3]. Mahalanobis’ con¬ 
cept of the generalized distance between two populations [4] was also found to 
be closely related to the discriminant function. In any of these applications— 
to classification, testing significance, or estimating distance—we may also be 
interested in considering whether certain of the measurements really contribute 
anything to the purpose at hand, and helpful tests of significance are available 
for this purpose. 

Recently the authors encountered a problem in which it seemed advisable to 
combine discriminant function analysis with the analysis of covariance. This 
case occurs whenever, in addition to the discriminators, there is a measurement 
whose mean is known to be the same in both populations. Suppose, for example, 
that the I.Q.’s of each of a sample of students are measured. The sample is 
then divided at random into two groups, each of which subsequently receives a 
different type of training. Measurements made at the end of the period of train¬ 
ing would be potential discriminators, but in the case of the initial LQ.’s we can 

151 



152 


W. G. COCHRAN AND C. I. BLISS 


dearly assume that there is no difference in the means of the populations cor¬ 
responding to the two groups. 

The initial I.Q. measurements are of course of no use in themselves in studying 
differences introduced by the training. Nevertheless, if they are correlated with 
the discriminators, they may serve in some way to ‘improve* the discriminant: 
e.g. to increase the power of Hotelling’s T 2 test, or to reduce the number of errors 
in classification. This paper discusses the problem of utilizing such measure¬ 
ments, which will be called covariance variates . The problem is analogous 
to that which is solved by the analysis of covariance. In covariance, as applied 
for instance in a controlled experiment, variates that are unaffected by the 
experimental treatments can be used to provide more accurate estimates of the 
effects of the treatments or to increase the power of the F test of the differences 
among the treatment means. 

The procedure suggested is as follows. First, the multiple regression is ob¬ 
tained of each discriminator on all the covariance variates. These regressions 
are calculated from the ‘within-sample’ sums of squares and products: that is, 
from the sums of squares and products of deviations of the individual measure¬ 
ments from their sample means. Each discriminator is then replaced by its 
deviations from the multiple regression, and a new discriminant function is 
calculated in the usual way from these deviations. The extensions of Hotelling’s 
T 2 and Mahalanobis’ distance are both obtained from this discriminant, though a 
further adjustment factor is needed for tests of significance. 

This paper is arranged in three parts. Part I presents a numerical example. 
The decision to place the example first was taken because most of the actual 
applications of the discriminant function in the literature appear to have been 
made by persons relatively unfamiliar with the theory of multivariate analysis. 
It is hoped that with the aid of the example readers in this class may be able to 
utilize covariance variates. For the same reason, the calculations have been 
presented as far as possible in terms of the operations of ordinary multiple re¬ 
gression, rather than in the form in which they first emerge from the theory. 
Actually, various equivalent methods of calculation are available, and it is not 
claimed that our method is necessarily the best. A mathematical statistician 
may prefer to follow the computing methods which come directly from theory 
(Part II, section 13). 

The example is more complex in structure than the two-sample case. The 
data constitute a two-way classification, in which the row means are nuisance 
parameters, being of no interest, while only a single linear function of the column 
means is of interest. It is well known that the ordinary t test can be applied 
not only to the difference between two sample means, but to any linear function 
of a number of sample means in data that are quite complex. Discriminant 
function technique can be extended in the same way, and readers familiar with 
the analysis of variance should find no great difficulty in making the appropriate 
extension to such data. 

Part II presents the theory. The reader who is primarily interested in theory 



DISCRIMINANT FUNCTIONS 


.163 


should read Part II before Part I. Since the approaches used by Mahalanobis, 
Hotelling and Fisher all converge, we have chosen that of Mahalanobis, mainly 
because the extension of his techniques to include covariance variates seems 
straightforward. Maximum likelihood estimation of the generalized distance 
is presented in full for the two-population case. The frequency distribution of 
the estimated distance and the extension of the T 2 test are worked out. An 
attempt is also made to obtain a quantity that will measure what has been gained 
by the use of covariance. 

In order to illustrate how the theory applies with other types of data, the 
mathematical model is given for the row by column classification that occurs in 
the example. The major results for this model are indicated, though without 
proof. 

In Part III it is shown that the computational methods used in the example 
are equivalent to those developed by theory. While this can easily be verified 
in a particular case, it is not intuitively obvious. 

PART I Numerical Example 

3. Description. The data form part of an experiment on the assay of insulin 
of which other parts have been published [5]. Twelve rabbits were used. 
Each rabbit received in succession four doses of insulin, equally spaced on a log. 
scale. An interval of eight days or more elapsed between successive doses, and 
the order in which the doses were given to any rabbit was determined by random¬ 
ization. Thus the experiment is of the ‘randomized blocks’ type, where each 
rabbit constitutes a block and there are 12 blocks with 4 treatments each. 

The effect of insulin is usually measured by some function of the blood sugar 
of the rabbit in periodic bleedings after injection of the insulin. The blood sugar 
was measured for each rabbit at 1, 2, 3, 4, and 5 hours after injection, and also 
before injection. In order to simplify the arithmetic, only the initial blood sugar 
and the blood sugars at 3 and 4 hours after injection will be considered here. 
These data are shown for the first three rabbits (with totals for all 12 rabbit") 
in Table I. 

Let Xi WK be a typical observation of blood sugar, where i = 3,4 stands for the 
hour after injection, w for the rabbit and d for the dose. The mathematical 
model to be used is as follows. 

(1) Xitoa “ y>% "f“ Piw + y*M "f* P*q(%0u>9 £<)••) 6iv9 • 

The parameters Hi , piw and y%$ represent the true mean and the effects of rabbit * 
and log dose respectively. The quantity xow$ is the initial blood sugar for the 
rabbit w before the test at dose z, while *<>.. is the average initial blood sugar over 
the whole experiment. The blood sugar at i hours has been found experimentally 
to be correlated with the corresponding initial blood sugar, and the relationship 
is represented here as a linear regression, with p%o as the regression coefficient. 
The residuals are assumed to follow a multivariate (in this case bivariate) 
normal distribution, with zero means. The covariance between e,** and ey** 



154 


W. G. COCHRAN AND C. I. BLISS 


is taken as < 7 , 7.0 • The model is the standard one for the ordinary analysis of 
covariance, except that we have two measures of the effect of insulin, x% and x 4 . 

One additional assumption was made. For all post-injection readings, the 
blood sugar seemed linearly related to the log dose t ,. Since this result has been 
found in other experiments, we assumed that 

7»J = Mm 

where is the regression coefficient of blood sugar on log dose. 

4. Object of the analysis. Our object was to find the linear combination of 
the three blood sugar readings that would measure best the effect of the insulin. 
Because of the linearity of the regression on log dose, the effect of insulin on each 

TABLE 1 


Sample of original data on blood sugar levels in insulin experiment 


Rabbit 

No. 






Log dose 






Initial blood sugar x 0 

Three hours 

Four hours jca 


.32 

.47 

.62 

.77 

.32 | 

.47 

.62 

.77 

.32 

.47 

.62 

.77 

i 

75 

94 

107 

94 

95 

76; 

67 

56 

90 

95 

115 

91 

2 

91 

86 

83 

93 

98 

90 

77 

. 69 

104 

87 

90 

89 

3 

97 

99 

90 

! 

91. 

1 ! 

84 

76 

59 

48 

93 

102 

85 

90 

Total*.! 

1 

1065 l 1074 l 112lll070- 

1 1 ' 

932 

00 

to 

731 

1 

591 

00 

© 

r—4 

1026 

970 

847 


*12 rabbits. 


Xi is known completely if the slope 6 , is known. It seems reasonable to choose 
the linear compound of the x t ’s which will give the maximum ratio when its 
estimated regression on log dose is divided by the estimated standard error of 
this regression. We now consider how to obtain this maximum. The argument 
given below is not intended to prove the validity of the method, for which refer¬ 
ence should be made to Part II. 

The true regression of the original blood sugar x 0 on log dose is known to be 
aero. Hence, it is clear that the variate is useful only in so far as it enables 
us to obtain more accurate estimates of and d 4 . For this purpose we need to 
estimate the effect of x 0 upon x z and x 4 , the blood sugar readings at 3 and 4 hours, 
independently of dose of insulin or of differences between rabbits. From the 
standard theory of covariance the best estimate is the regression* coefficient 
b<o — Eio/Eoo , where E denotes a sum of squares or products calculated from 
the error line in the analysis of covariance; that is from the sums of squares and 
products of deviations of the x< from the fitted regression on row and column 
parameters. 





DISCRIMINANT FUNCTIONS 


m 


The regression of the blood sugar at each hour on the log dose of insulin is 
calculated from totals adjusted for the regression on x Q . Since the 4 successive 
log doses (z = 1, 2, 3, 4) are spaced equally, they may be replaced in the com¬ 
putation by the coded doses -3, -1, +1, and +3. If we let T it be the total 
blood sugar, summed over 12 rabbits, at the ith hour with dose z, the following 
result is well known for the analysis of covariance. The best estimate of 
*i(i = 3, 4) is 

[(—3T t *i — Ta + + 3T»i) — bio (•—3Toi — Tot + Tot, + 37o4)]/240. 

The divisor, 240, is 12(3 2 + l 2 + 1* + 3 2 ). The expression may be written 

d'i _ ( d x — 6 t0 do) 

240 ’ 240 


where 


d< = — 3T tl - T t2 + Ti 3 + 3T <4 . 

A linear combination is formed from d' 3 and d' 4 , the numerators in the best 
estimates of and 5i , by means of the coefficients L s and L 4 . L s and L 4 are 
computed so as to maximize the ratio of 

dj = Lzdz -f* E,d 4 

to its estimated standard error. 

From the definition of d\ , this requires a discriminant of the form 

I — L 3 {x 3 u>m b 3 oXo W g) 4" E 4 {x 4wg b.oXo WK ), 

where each x 0wg is measured from its mean. 

We require next the estimated standard error of d r . This depends, in turn 
upon the variances of d 3 and d 4 and their covariance. As usual in the analysis, 
of variance we have 

(5) V(d'») = V(4) + dl V(b 3 o) = <733-0 (240 + . 


The residual variance <r 33 .0 is estimated from the sums of squares and products 
in the error row of the analysis of covariance as 

— E$i,o/'ti ^ (F33 E$o/Eoo)/n> 

where n is the degrees of freedom in each Eu diminished by one. Similar methods 
lead to the variance of d 4 and to the covariance of d 3 and d 4 . It follows that the 
true variance of d, may be written 

( 6 ) V(dj) a L\ (Tjs.o + 2 L 3 L 4 <rj 4 .o + L\ <T 44 .o » 


where the factor 


240 + 


d\\ 

-Boo/ 


in equation (5) is omitted since it does not involve 



156 


W. O. COCHRAN AND C. I. BUSS 


the V s. Similarly, the estimated variance of d r , apart from constant factors, 
may be written as 

(7) L\Ezz& + *UUEu« + LlEu.Q . 

The quantity to be maximized is therefore 

(L*dl + Ljd'j) _ 

y/L* Ezi.o “t* 2L 3 LuEm.q -f- L 4 E 4,0 

Formally, this is the same type of quantity that is maximized in ordinary analysis 
with the discriminant function. Differentiation with respect to the L’s leads 
to the equations (after omission of another constant factor) 

(8) Em.oL 3 + Ez\,oL\ — dz , Ez\.oLz + Eu,oLi — d [. 

The objective of the computation, therefore, is to obtain discriminant coefficients 
having the same ratio to each other as L 3 and L 4 in equations (8). As will be 
shown in the next section, this can be accomplished in practice more conveniently 
by substituting an alternative set of three simultaneous equations for the two 
in equations (8). 

5. Calculations. The first step is to form the sums of squares and products 
in the analysis of covariance. With 12 rabbits and 4 doses, the conventional 
breakdown of each total sum is into components for rabbits (11 d.f.), doses 
(3 d.f.) and rabbits X doses (33 d.f.). Because of the assumed linear regression 
on log dose, the sum of squares for doses was further divided into two com¬ 
ponents. The first (1 d.f.) is the contribution due to this regression. For Xi , 
the sum of squares due to regression is d]/ 240, or in the case of x 3 , (1164) 2 /240, 
or 5645. The remaining component, (2 d.f.) is called the curvature , since it 
measures the effect of deviations from the linear regression. The sum of squares 
for curvature is found by subtraction. 

The following points may be noted, (i) For both x 2 and x { , the F ratio of the 
curvature mean square to the rabbits X doses mean square will be found to be 
less than 1, so that the data do not suggest rejection of the hypothesis of a linear 
regression on log dose, (ii) The F ratios of the regression mean squares to the 
rabbits X doses mean squares are highly significant, being 57.8 for x 3 and 28.7 
for X \. This indicates, incidentally, that the three-hour reading may be a more 
responsive measure of the effect of insulin than the four-hour reading, (iii) With 
Xo , the F ratio does not approach significance for either the regression or the 
curvature, as is to be expected. 

A consequence of the assumption of linear regression on log dose is that the 
curvature mean squares and products are estimates of the same quantities as 
the rabbits X doses mean squares and products. Consequently, the lines for 
curvature and rabbits X doses in Table 2 could be added to give 35 d.f. for the 
'error' sums of squares or products, E Z] , etc. We decided, however, to estimate 



DISCRIMINANT FUNCTIONS 


vn 

the error only from the 33 d.f. for rabbits X doses. This was done because it 
seemed to facilitate a test of the curvature of the final discriminant /. (This 
test will not be reported here.) 

The L's could now be obtained from equations (8). In this case the first 
equation would contain the terms 

#3s.o - 3223 - (1259)72351; # 8 4 .o - 1200 - (1259) (1340)/2351; 

d'i = di - bso d 0 — -1164 — 02, 

leading to the simultaneous equations 

2548.8 U + 482.4 L 4 = -1197.2 
482.4 U + 2373.2 L 4 = - 844.3, 
which give L s /L 4 = -.41848/-.27070 = 1.5459. 

TABLE 2 


Sums of squares and products 


Component 

d. f. 

*5 

x > 

1 

*or* 

*oT 4 

***4 

Between rabbits. 

11 

886 

9376 

11165 

1952 

2477 

9206 

Between doses. 

3 

168 

5806 

2810 

-247 

-98 

3981 

(Reg. on log dose. 

1 

16 

5645 

2727 

-301 

-209 

3924 

\Curvature. 

2 

152 

161 

83 

54 

HI 

57 

Rabbits X doses. 

33 

2351 | 

: 

I 3223 

! 1 

3137 

1259 

1340 

1200 

Total. 

47 

3405 ! 

18405 

17112 

2964 

3719 

14387 


Instead of using these equations, we propose to solve alternatively the set of 
three equations 

SqqLq + SqzLz + SfuLi — do 

(9) SwLq + S 33 L 3 + Ss\La = d$ 

SmLq + S 4 jLs + SuLu = d 4 , 

where each Si,- (i = 0, 3, 4) is the sum of squares or products formed by adding 
the error line in the analysis of variance to the line for regression on log dose . 
Thus Sh has 34 d.f. The ratio of L 3 to L x , as found from equations (9), is exactly 
the same as that found from the original equations ( 8 ), as is proved in section 18. 
Further, the solution of the new equations seems to be more useful for performing 
tests of significance, as will appear in following sections. 

Accordingly, the first step after forming the analysis of variance is to set up 
the three equations (9). 










158 


W. G. COCHRAN AND C. I. BLISS 


The equations are solved by means of the inverse matrix. The values of d{ 
on the right side of the equations are replaced successively by 1, 0, 0 by 0, 1, 0 
and by 0,0,1 to obtain the three sets of values for L 0 , L* and L 4 . These results 
are given in the first three columns of Table 4 and are designated as c„ . 

The L’s follow from the c t} by the usual rule for regressions. For example, 

U = {(.003209)(62) + (.227781X—1164) + (-.199655)(-809)}-10"* - 

-.103417 


TABLE 3 

Equations for determining L 3 and Li 


2367L 0 + 958L 3 + 1131L 4 = G2 
958L 0 + 8868L 3 + 5124L 4 = -1164 
1131Lo + 5124L, + 5864L 4 = -809 


The composite response or discriminant, adjusted for the covariance variate, is 
now taken as 

' - L - (**-£ & *■) 

or 

- 103417 (- - s §?-) - 066883 (*• - m ”) 

= .093503a;o - .103417^3 - .066883a; 4 . 

Note that the value of L 0 is not used at this stage and that L 3 /L 4 = 1.546 agrees 
with the value found from equations (8). 

TABLE 4 

Inverse matrix (X 10 2 ) and L’s 



(10»c„) 


d, 1 

Lt 

.465408 

.003209 

- .092568 

62 ! 

.100008 

.003209 

.227781 

-.199655 

-1164 1 

-.103417 

-.092568 

-.199655 1 

.362846 ! 

-809 1 

- .066883 


A similar method may be followed when there are more discriminators or more 
covariance variates. With two covariance variates, xo and x \, for instance, the 
adjusted discriminant would be 

Lz(xz — bxXo — bloxl) + L 4 (.r 4 — bmXo — b\oxl) 

where ,^30 are the partial regression coefficients of x* on xo , arj respectively, 
determined from the error line, and similarly for x 4 . Further, since any linear 



DISCRIMINANT FUNCTIONS 


459 


function of the column (dose) means may be represented as a regression on some 
variate t ,, this method may be applied to any linear function of the column means 
in which we are interested, provided that the mathematical model is appropriate. 

6* Test of the regression of the adjusted discriminant on log dose. The 
numerator of the regression of / on the coded doses is 

di = Lz{dz — bzodo) + L 4 (d 4 — bmdo). 

Since the regressions of x% and x 4 on the coded doses were both significant, it 
may be confidently expected that the regression of I will also be significant. The 
test of significance will, however, be given in case it may be useful in other appli¬ 
cations. For those who are familiar with multiple regression, the test is perhaps 
most easily made by means of a device due to Fisher [2]. 

Construct a dummy variate y wt such that y wt is always equal to t, t or in our 
case to the coded doses. That is, y takes the value —3 for all observations at 


TABLE 5 

Analysis of y 2 and yx x 



i d. f. 

! y ’ 

y*i 

Rabbits. 

11 

! 0 

0 

Doses. . 

3 

240 

1 d t 

Regression on log dose. 

1 

240 

di 

Curvature. . 

2 

1 0 

0 

Rabbits X doses = error. 

. ...; 33 

1 

i o 

i 

1 0 

Sum = Error plus reg. on log dose... . 

34 

240 

I 

1 * 


Total. 47 1 240 | d ( 


the lowest dose level, and —1, +1, and +3 respectively for all observations at 
the successive higher dosage levels. We shall show that equations (9) solved 
in finding the L’s are formally the same as a set of normal equations for the linear 
regression of y on x 0 , x 3 , and x 4 . 

The following analysis for y 2 and yxi may easily be verified. 

It will be noted that the sum of products of y and x,* in the sum line is . 
Further, Sij is the sum of products of x x and x,- for this line. It follows that the 
normal equations for the regression of y on the x’s, as calculated from the “sum” 
line, are 

SioLo + SizLz + Sh Li = di (i = 0 , 3 , 4 ). 

These are just the equations solved in obtaining the L’s. Consequently, 
and Li are the partial regression coefficients of y on x 3 and xa . A test of the null 













160 


W. G. COCHRAN AND C. I. BLISS 


hypothesis that the true values of L 3 and Li are both zero can be made by the 
standard method for multiple regression, as will be shown later from theory. 
This test is equivalent to a test of the hypothesis that the true value of dj is zero. 

To apply the test, we require three items in the analysis of variance of y. 
First, the total sum of squares for the Sum line, already seen to be 240 (Table 5). 
Second, the reduction due to a regression on all variates (covariance variates 
plus discriminators). By the usual rules for regression, this is (from Table 4) 

Lodo + Lzdz + Lidi = (.100008)(62) + (-.103417) (-1164) 

+ (— .066883) (—809) = 180.69. 

Finally, we need the reduction due to a regression on the variates that are not 
being tested, i.e. on the covariance variates alone. From Table 4, the reduction 


TABLE 6 

Analysis of variance of dummy variate y 



d. f. 

s. s. 

M. S. 

Reduction to regression on covariance variates. 

1 

1.62 


Additional reduction to regression on dis¬ 
criminators . 

2 

179.07 

89.54 

Deviations. 

31 

59.31 

1.913 

Total (from Sum line). 

34 | 

240.00 



in this case is simply dl/Soo or (62) 2 /2367, or 1.62. The difference, 180.69 — 1.62, 
represents the reduction due to the regression of y on L 3 and L 4 , after fitting x 0 . 
The resulting analysis is given below, the degrees of freedom being apportioned 
by the usual rules. 

The F ratio, 89.54/1.913, or 46.80, with 2 and 31 d.f., is used to test the null 
hypothesis that the adjusted discriminant has no real regression on log dose. 

7. Test of particular discriminators. Another useful test is that of the null 
hypothesis that a particular discriminator, or group of discriminators, contribute 
nothing to the adjusted discriminant. In other words, this is a test of the null 
hypothesis that the true values of a subset of the U s are all zero. The test is of 
interest in the present investigation, since it would be useful to know whether 
all five hourly readings of the blood sugar are really helpful. As might be 
expected by analogy with the previous section, the test is made by calculating the 
additional reduction due to the regression of y on the particular subset of the 
V s in question. 

The test will be illustrated with respect to Li . One method of making the 
test is to re-solve the normal equations with L 4 omitted. From this solution 






DISCRIMINANT FUNCTIONS 


161 


the reduction due to a regression of y on xo and x z alone is obtained. The addi¬ 
tional reduction due to a regression on x 4 is found by subtraction from 180.69. 

However, the additional reduction can be found directly from the well-known 
regression theorem that it is equal to L\/cu . The c’s have already been found 
in Table 4. The result is (.066883)7(.000362846), or 12.33. This value is 
tested against the residual error mean square of 1.913, F having 1 and 31 d.f. 
The contribution is found to be significant. 

In fact, by this process a kind of estimated standard error can be attached to 
each of the L’s for the discriminators, using the formula , where s is the 
residual root mean square. Thus for L 3 , (-.103417), the ‘standard error* is 
a/(1.913)(.000227781), or .0209. It should be stressed that at this point the 
analogy with regression is rather thin. The V s are not normally distributed,, 
nor do the estimated standard errors follow their usual distribution. It is, 
however, correct that if the true value of L 4 is zero, L A /s\^ i follows the t distri¬ 
bution with 31 d.f. Thus, if omission of some discriminators seems warranted, 


TABLE 7 

Analysis of variance for regression of y on the discriminators 


d. f. 

S. S. 

M. S. 

Regression. 

.! 2 

159.20 

79.60 

Deviations. 

.I 32 

80.80 

2.525 

Total. 

. • • J_34 j 

240.00 | 



these t ratios are relevant in deciding which variate to eliminate first. Strictly 
speaking, the c’s should be re-calculated after each elimination before deciding 
which other discriminators might also be discarded. 


8. Estimation of the gain due to covariance. The tests given above enable us 
to state whether the discriminators contribute significantly, in the statistical 
sense. It is also of interest to investigate what has been gained by the use of 
the covariance variates. From the practical point of view, the question: “What 
is the gain from covariance?” might be re-phrased as: “If x 0 is ignored, how 
many rabbits must be tested in order to estimate the regression on log dose as 
accurately as it was estimated with the adjusted discriminant for 12 rabbits?” 

The theoretical aspects of the question are discussed in section 16; the calcula¬ 
tions are described here. The only new quantity needed is the F ratio for the 
regression of y on the discriminators alone. This can be obtained by a new 
solution of the normal equations, this time with the covariance variates omitted. 
With just one covariance variate, it is quicker to use the fact that the additional 
reduction to the regression of y on Xo , after fitting Xz and x A , is L*/Coo, 
or (.100008)7(.000465408) or 21.49. Consequently, the reduction due to a 






162 


W. G. COCHRAN AND C. I. BLISS 


regression of y on z% and Xi alone is 180.69 — 21.49, or 159.20. The F ratio, 
79.60/2.525, is 31.52, whereas the F ratio with covariance is 46.80 (from 
Table 6 ). The quantity suggested from theory for comparing the two techniques 
is 


(n, - 2 )F _ 1 
n* 

where n* is the number of d.f. in the denominator of F. These values are 
{(30 X 31.52/32) - 1} or 28.55 with no covariance and {(29 X 46.80/31) - 1} 
or 42.78 with covariance. The relative information is estimated as 42.78/28.55, 
or 1.50, so that the use of covariance gives 50 per cent more information. In 
f other words about 18 rabbits would be needed if the initial blood sugars were 
ignored. To a slight extent this estimate favors the covariance analysis, since 
it ignores the increased accuracy that would accrue from the extra error d.f. if 
18 rabbits were used without covariance. 

PART II Theory 

9. Notation. The theory will be given first for the two-population case- 
We suppose that a random sample of size N has been drawn from each popula¬ 
tion. A typical discriminator is written Xi wa and a typical covariance variate 
Xfra , where 

i f j * 1 , 2 , • • • p denote discriminators, 

{, 17 = 1 , 2 , • • • k denote covariance variates, 
w = 1 , 2 denotes the population, and 
a = 1 , 2 , • • • N denotes the order within the sample. 

The population mean of x**® is m»u> , and the corresponding sample mean is Xi W . 
The difference (m« — m»i) is denoted by and the corresponding estimated 
difference (x t2 . — x tl .) by dt . 

10 . Discriminant functions and generalized distance. Since we propose to 
approach the theory by means of the generalized distance, it may be well to 
review briefly the relation between the discriminant and the generalized distance. 
In the ordinary theory (with no covariance variates) it is assumed that the var¬ 
iates Xi wa follow a multivariate normal distribution, and that the covariance 
matrix an between X{ wa and x jwa is the same in both populations. The gener¬ 
alized distance of Mahalanobis is defined by 

(10) pA* = ^ a %i bib j , where (<r^) = (<r»/) l . 

<.y-1 

In order to estimate this quantity from the sample, we first calculate the mean 
within-sample covariance , where 



DISCRIMINANT FUNCTIONS 


163 


The estimated distance is then taken as 

(12) pD* = £ s' 3 d.d,. 

*.J-1 

Apart from a factor N/(N — 1), this is the maximum likelihood estimate. 

In the discriminant function used by Fisher (1), the object is to find a linear 
function I wa = 2M x x xwa , where the M x are chosen to maximize the ratio of the 
sum of squares between samples to that within samples in the analysis of variance 
of /. This is equivalent to maximizing the ratio of the difference between the 
two sample means of I to the estimated standard error of this difference. As 
Fisher showed (2), the M x (apart from an arbitrary multiplier) are given by 

M , = £ 

J-* 

Consequently, the difference between the two sample means of /, the discriminant 
function, is 

£, M , d , = £ s ' 3 d ^,. 

i-l * ;-l 

This is exactly the same as pD 2 in equation (12). Thus the discriminant func¬ 
tion leads to the estimated distance, and vice versa . 


11. Extension to the present problem. In our case there are {p + k ) variates 
(p discriminators, k covariance variates) from which to estimate the distance. 
All variates, x xwa and x iwa , are assumed to follow a multivariate normal distribu¬ 
tion. The covariance matrix, assumed the same in both populations, now has 
(p + k) rows and columns, and may be denoted by 



For each of the covariance variates, it is known that the population means 
Pu i are equal, so that the difference 5$ is zero. This is the fact that dis¬ 
tinguishes the problem from ordinary discriminant function analysis. 

Hence, the generalized distance, as defined from all (p + k) variates contains 
no contribution from terms in 5* and is given by 


(p + AOA 2 


V \p+k) tij 


The matrix <r|p+*) is that formed by the first p rows and columns of the inverse 
of A. Note that in general*this will not be the same as the matrix cr' J , which is 
the inverse of <r XJ . 

In the next section we consider the estimation of this quantity from the sample 
data. By analogy with the previous section, it might be guessed that the 
estimate would be of the form S«{ 7 P +*) d x d t . The maximum likelihood estimate 



164 


W. G. COCHRAN AND C. I. BLISS 


turns out to be of this form, except that instead of we have d \, the difference 
between the two sample means of the deviations of Xi from its ‘within-sample’ 
linear regression on the . 


12. Estimation of the distance. It is known that the generalized distance 
is invariant under non-singular linear transformations of the variates. For 
convenience, we replace the Xi wa by variates x iwa , where 

k 

Xiwa = Xi W a 23 fii*{x$wa “““ 

(-1 


Thus z'iwa is the deviation of Xi wa from its population linear regression on the 
Xc wa . The population mean of x iu a is clearly , and the difference between 
the two population means is therefore 5». 

The covariance matrix of the x iwa , x^ wa may be written 


(15) 



where cm $ denotes the covariance matrix of the deviations of the Xi wa from their 
regressions on the x ^ a . It follows that in terms of the transformed variates 
the generalized distance is given by 


(16) (p+fc)A* = £ 

VJ-1 

where <r t; * is the inverse of the p X p matrix . 

The joint distribution of the 2 N observations on each of the x'i wa and Z( Wa 
is as follows: 


(2T y NiP+k) | S* | +* | e* | Ildx'wadXiwa' 


exp ^ J ^ 23 it , CT {.Xiwa Htw ) {Xjwa Hjvo ) *1“ 
V & Lt*-1 a-1 i.i-1 


2 .V fe 

Eli **’(*£ wa f*(w)(Xqwa 

U>«1 a-1 £,ij-l 




l 

r 


where c* n is the inverse of the k X k matrix cr^ . 

We now proceed to estimate A 2 in equation (16) by maximum likelihood. 
For this, we obviously need the sample estimates of the <r ,J * and the Si , and it 
will appear presently that the sample estimates of the are also required. 
However, it happens that the and the are not needed. Hence the rele¬ 
vant part of the likelihood function is 



DISCRIMINANT FUNCTIONS 


165 


where 


2\u>« ■* X XVD a 2 *“ jU$u>) . 

Differentiating first with respect to ji IM ,, we obtain 
(18) £ £ Kx,„ a - M = 0. 

0-1 J-l 


Except in the case (with probability zero) where our estimate of <r tJ * turns out 
to be singular, these equations have no solution except 


(19) 


2ls (.Xj\oa H]w) 0 


for every j, w. Consequently 


so that 


fifw — %]U>- 


A 


5, 


H ,2 — H )1 = Xfl 



This shows that the must also be estimated. Now 


dJj __ y** y'* dL dx %ua 

t 0 —1 a—1 dXitoa 


2 ff p 


(Xfaa Mto)(%jwa Mjto)* 


Once again, unless the estimate of a ' 1 * is singular, the only solutions of the equa¬ 
tions formed by equating this quantity to zero are 

2 ff 

(20) 2D 2D (^{wfl Mjw) = 0 

w—1 a—1 


for every f,j. 

Since frj W = x', M , the term in^ w vanishes. Substituting for x f in terms of x 
from (17), we obtain 

2D 2D /(^7»« ™“ Xfa.) ““ 2D ^(^ijtra #iji» )| * 0 

10—1 o—l ( J 

where 6 „ stands for the maximum likelihood estimate of . These equations 
may be written 

k 

( 21 ) 2 D bjnEfrf = 

f-l 

where E denotes a sum of squares or products of deviations from the sample 
means, containing 2 (N — 1) degrees of freedom. The equations are therefore 



166 


W. G. COCHRAN AND C. I. BLISS 


the ordinary normal equations for the ‘within-sample’ multiple regression of 
Xj W a On the * 

Finally, differentiation of L with respect to the (r tj * leads to 
(22) 2 m {i .( = tl (x<„ a - xLKx'toa - 

to—1 a—1 

This is just the ‘within-samples’ sum of squares or products of the variates 
On substituting for the x' in terms of the x and using equations (21), we obtain 

* 

2 = Eh - £ b i( Ej( = Ei,. ( (say). 

*-l 

To summarize, the estimated distance is given by means of the equation 
(p + fc)D s = £ « 2N £ E ij i d'i d’j, 

*./-i i.y-i 

where E** * is the inverse of E i} .^ and 

di - di - £btd t . 

This estimate was obtained by assuming all variates jointly normally dis¬ 
tributed. From the form of the likelihood function (17) it can be seen that the 
M.L. estimate of the distance remains the same under the less restrictive assump¬ 
tions that the x^ wa are fixed, while the deviations of the x iwa from their regressions 
on the X( wa are jointly normal. 

13. Computational procedure. An orderly procedure for calculating the 
generalized distance will now be given. From this, the method for computing 
the corresponding discriminant function will be shown. The computations also 
lead to the generalization of Hotelling’s T 2 . The steps are as follows. 

(i) . First form the ‘within-sample’ sums of squares and products of all variates, 

with 2 (N — 1) degrees of freedom. These are the quantities denoted by 

E%j , Ei$ , Eft . 

(ii) . Invert the matrix , giving E u . 

(iii) . The regression coefficients , estimates of the ft*, are now obtainable by 

means of the relations 

bit = Z Ei,E ( \ 

1J-1 

as is clear from the usual matrix solution of equations (21). 

(iv) . The sums of squares and products of the deviations of the Xi from their 

‘within-sample’ regressions on the x* are now computed from equations (22) 

k 

2Nfrij.i = Eij.i = Eu — 2 bn Eft. 



DISCRIMINANT FUNCTIONS 1^7 

(v). The final step is to invert the matrix E ^.$, giving E**'*, and to form the 
product 

(p + k)D* = 2N ^ E iJ * di d't , where d\ = d* — 23 &%* d*. 

When there were no covariance variates, the discriminant function I had the 
property that the difference between the two sample means of I was equal to the 
estimated distance (Section 10). This relationship can be preserved when co- 
variance variates are present by defining I so that 



and calculating the weights Mi from the equations, 

£ Etj.tMi = d<. 

7-1 

For in that case, 

Mi = £ E li t d'j. 

7-1 

Consequently the difference between the two sample means of I is 
£ Mi d'i = £ E ii l d'i d'j , 

t~l i.;-l 

which (apart from the constant 2N) is equal to (p 4- k)D 2 . 


14. Distribution of the estimated distance. In the ordinary case, with no 
covariance variates, the frequency distribution of the estimated distance has 
been given by several authors, e.g. Hsu [6]. It will be found that in our prob¬ 
lem the distribution is essentially the same, except that the quantity D 2 must be 
multiplied by a new factor and that one set of degrees of freedom entering into 
the result must be changed from (n — p + 1) to in — p — k + 1). 

Thus far we have assumed that all variates jointly follow a multivariate nor¬ 
mal distribution. It is convenient at this stage to regard the covariance variates 
X( wa as fixed from sample to sample, and to use the conditional distribution of 
the Xi wa , subject to this restriction. It is well known (e.g. Cram6r [7, section 
24.6]) that this conditional distribution is the multivariate normal 

(2*r xp i ° iii m**- 


exp{-*[££ £ *«•*(*„ 
( Ltc-l cr-l t.J-1 


7w«) (&jwa M/u> 


where 

k 

** 23 u>* 


y%w a 



168 


W. G. COCHRAN AND C. I. BLISS 


Since the estimated distance is a function of the quantities E xi j, d<, we now 
find the joint distribution of these variates. The joint distribution of the 
sums of squares and products E xv $ is obtained by quoting a slight extension of a 
result due to Bartlett [8], which may be stated as follows. 

Let the variates x xva follow the distribution (23) and let 

2 JV 

(t) E X ] == ^ (x xw a X xv3 ) (%jwa X] W ) 

to—1 o—l 

be a typical 1 within-samples ’ sum of squares or products , 

(«) b. f = 

D-l 

6e the ‘within-samples’ partial regression coefficient of x % on , and 

k 

(Hi) E t] .t = E X ] — 5^ frisks 

{-l 

he /he earn ©/ squares or products of deviations from these regressions . Then 

(а) /he quantities E tJ $ follow the Wishart distribution 

c I exp j- \ jfc S’ % tj n t 

with (n — h) d./., where n = 2(AT — 1), 

(б) /his distribution is independent of that of the , and 

(c) ho/h distributions are independent of that of the means x tw and consequently 
of that of the difference d x = ( x t2 — x xi ). 

The result was proved by Bartlett for a sample from a single population. The 
extension to the case of two populations is straightforward and will not be given 
in detail. 

From (b) and (c) it follows that the distribution of the E tJ f is independent 
of that of the quantities 

k 

d x = d x 22 b x $ d$. 

f-i 

Further, with the X( variates fixed, the d[ are linear functions of the x lwa with con¬ 
stant coefficients and hence follow a multivariate normal distribution, Wilks 
[9]. We now find the means and the covariance matrix of this joint distribution. 
From the joint distribution (23) of the x twa , it is easily seen that 

(24) E(d.) - 8, + i/S. t d { . 

$-1 

Also, since by standard regression theory the &,$ are unbiased estimates of the 
ft* > 

U-i J €-1 



Hence, by subtraction, 
(25) 

Now 


DISCRIMINANT FUNCTIONS 


E(d' % ) - 


Cov (d % dj) — Cov (d t — 23 d$)(d/ — 23 d f ). 

f-i *-i 

By (c) the distributions of the d t , are independent, so that there will be no 
contribution from products of the form d t b }1) . Hence 

(26) Cov (d' x d\) = Cov (d, d,) + E d ( d % Cov (b t( b„). 

€ *7-1 

Since dt is the difference between the means of two samples of size N t Cov 
(d,dj) is 2 <r», s/N. The covariance of b t * and is more troublesome. Writing 
the expressions for these regression coefficients in terms of the original data, we 
have 

Cov (b l( b„) = E E H E” Cov (EM = 

X r-1 

k IN 

23 E E 23 23 fa/Xwa #Xte ) fo*r X VM ) Cov (x xwa • 

X )>-il u?,r—1 a f—1 

Since successive observations are assumed independent, the covariance term van¬ 
ishes unless w = z and a = f, in which case it equals <r tJ € . Thus 

Cov (MJ = crt, { Z i* 11 


Finally, from (26) 

(27) Cov (d( d’ t ) = a x , { 0: + £ dt d^j = t><r„ t (say). 

Having obtained the distributions of the E tJ t ,d[, we may apply Hsu’s result 
[6] for the general distribution of Hotelling’s T 2 . In our notation, this may be 
stated as follows. 

If the variates d[/y/v follow the multivariate normal distribution with means 
hjy/v and covariance matrix <r %] *, and if the variates E % , * follow the Wishart 
distribution with (n — k) d.f. and covariance matrix <r„ { , the two distributions 
being independent , then 

y = £ £T ,,f ci| 

* J-l 


follows the distribution 

00 J 

(28) e E + X, i(« — Ar — p + 1)} 


y »p+*-t(i + 



170 


W. G. COCHRAN AND C. I. BLISS 


where 


t = i £ v, 

*,y-i 

* - |+ £ E ( 'd ( d,, n = 2(iV — 1). 

IV |.,-1 

This distribution is, of course, the distribution of the ratio of two independent 
values of x*» with p and (n — & — p + 1 ) d.f. respectively, in the case where the 
numerator is non-central. 

16. Tests of significance. This result leads to the extension of Hotelling's 
2* test. For if = 0 , (i = 1 , 2 , • • • p), then r is zero and 

£ £ <j | d; d; 

i.i-l 

is distributed as vpF/(n — & — p + 1 ), with p and (n — k — p + 1 ) d.f. The 
distribution (28) above gives the power function of this test. 

We may also wish to apply a test of this type to a subgroup Xi of the dis¬ 
criminators (t = 1, 2, • • • q < p). Speaking popularly, this is a test of the null 
hypothesis that the above variates Xi contribute nothing to the discrimination 
between the two populations, given that the remaining discriminators and the 
covariance variates have already been included . 1 To see what is meant more 
precisely, consider the following transformation: 

x\ *= Xi — 2 PuXi — 2 &$£*, i = 1 , 2 

x\ = xi - 2 fax !, l = g + 1, • • • p; 

*1 = *1 > £ = 1, 2 , • ■ • ft, 

where the 0 ; s are population regression coefficients. Then it is not difficult to 
see that the distance is now given by 

(p + k) A 2 = 22 Q* + ^2 <? lm 

t, 7—1 g-fl 

where is the inverse of the covariance matrix of the deviations of the Xi 
from their regressions on the x t plus the x^ , and 

8i = Si — 2 jfttjSj. 

Consequently if 5< = 0 , (z = 1 , 2 , • • • q) the distance is exactly the same as it 
would be if the variates Xi were omitted. The test in question is therefore a test 
of the null hypothesis that $< = 0 , (i = 1 , 2 , • • • 3 ). 

If both the remaining discriminators and the covariance variates x* are re¬ 
garded as fixed, the method of proof in the previous section provides an F test 


1 The test is illustrated in section 7. 



DISCRIMINANT FUNCTIONS 


171 


for this hypothesis also. It is found that the sums of squares or products E**'** 
follow a Wishart distribution with (n — k — p + q) d.f., while the quantities 

di = di — 6*1 di — 22 dt 

U fl +1 (-1 

are normally distributed, with zero means when the null hypothesis is true. This 
leads to the result that 

£ E <i ,( d'i d’j 

is distributed as v’qF/(n — k — p + 1), with q and (» — k — p + 1) d.f., and 

= 1+2 E ,( d,d t , 

the sum extending over both the covariance variates and the discriminators that 
are not being tested. 

16. Discussion of the gain due to covariance. In this section we attempt to 
construct a measure of the amount that has been gained by the use of the co- 
variance variates. Only a preliminary discussion will be given: a complete dis¬ 
cussion would be rather lengthy, owing to the many different uses to which the 
discriminant function can be put. Perhaps the problem can most easily be seen 
by considering the effect on Hotelling’s generalized T 2 test of significance. 

The power function of this test, as obtained from equation (28) section 14, de¬ 
pends on four factors; the level of significance that is chosen, the degrees of free¬ 
dom ni and n 2 in the numerator and denominator of F, and the parameter r. 
If the covariance variates were ignored, the usual T 2 test could be applied to the 
discriminators alone. In this case we would have 

n[ = p, n't = n — p + 1, r' = £2cr where v f = 2/N. 

With the covariance variates, we have 

ni = p, n 2 = n — p — /b + 1, r = JSc j lJ '%6j/v, 

where 

V = 1 + 2 E" d t </,. 

The first point, to note is that 

2 <r ii l 5>6, > 2 a^SiSj 

This is an instance of tho general result that the addition of new variates cannot 
the value of pA 2 . To see this, replace the .covariance variates by their 



172 


W. G. COCHBAN AND C. I. BLISS 


deviations from their regressions on the discriminators. This transformation 
gives 

(29) i, = £, a-^StSf + 2 S', , 

<,/-l i,;-l $ lf -l 

where 

6$ = <5$ — ^ ft* 6 t . 

4-1 

Since the term on the right of equation (29) is a positive definite quadratic form, 
the result follows. 

Consequently, the first effect of the covariance variates is to make the numer¬ 
ator of r greater than that of r'. As a partial compensation, the denominator 
v is also greater than v\ but it may be shown that the difference in the denomi¬ 
nators will usually be trivial if k is small relative to n. We therefore expect r 
to be greater than r'. Now for fixed n \, n 2 and significance level, it is well 
known that the power function (28) is monotone increasing with r. Hence, 
other things being equal, the increase in r due to the covariance variates leads to 
a more powerful test. 

The two power functions, however, differ in another respect, in that with co- 
variance the value of n 2 is reduced from (n — p + 1) to (n — p — k -b 1). This 
decrease in the number of degrees of freedom in the denominator of F will to 
some extent offset the gain from an increased r. Examination of Tang’s tables 
[10] indicates, however, that if the degrees of freedom are substantial, this effect 
will not be important. Moreover, in most practical applications, k is likely to 
be only 1 or 2. Hence, as a first approximation the effect will be ignored, though 
to do so tends to overestimate the advantage of covariance. 

Suppose now that r = rr', where r > 1. Since r' is proportional to N, the 
size of sample taken from each population, we could make r' = r by increasing 
the size of sample (when covariance is not used) from N to rAT. This suggests 
that the ratio r/r' can be used, as a first approximation, to measure the relative 
accuracy obtained with and without the use of covariance. This measure carries 
approximately the usual interpretation that the inferior method would become 
as good as the superior method if the sample size for the inferior method were 
increased by the factor r. A further refinement could be made to take account 
of the difference in the n 2 values. By trial and error applied to Tang’s tables, 
one could determine r' so that the two power functions would be as nearly coin¬ 
cident as possible. 

In practice, the ratio r/r' must be estimated from the data. From ‘the power 
function in equation (28) it is found by integration that the mean value of y is 

(2r + p)/(n, - 2), 
bo that an unbiased estimate of r is 

${(n* - 2 )y - p} “■ %P F ~ l}- 



DISCRIMINANT FUNCTIONS 


173 


This suggests that the quantity 

F _ i 
ni 

should be calculated both with and without covariance. The ratio of the two 
values will probably not be an unbiased estimate of r/r', but may be used pend¬ 
ing further information about its sampling distribution. This type of calcula¬ 
tion is made for the numerical example in section 8. 

17. The case of a row by column classification. Thus far the discussion has 
been confined to the case where there are only two populations. The technique 
may also be used when there are more than two populations. The difference 
Si between the two population means is replaced by some linear function of the 
population means. As an illustration we consider a row by column classification, 
the case that arises in the numerical example. No detailed proofs will be given, 
though it is hoped that the theory can be fairly easily developed from the mathe¬ 
matical model. 

A typical variate is Xi WM , where i = 1, 2, • • • p denotes the variate, w = 1, 
2, • • • r denotes the row and z = 1, 2, • • • c denotes the column, there being 
one observation in each cell. The variates x iwt follow a multivariate normal dis¬ 
tribution, with covariance matrix and means 

k 

fci.Z'iten') = Pt *"lH Piw *4” Yi* X/ fiitfai-wz 

where Piw denotes the effect of the row and y iM that of the column. Without loss 
of generality we may assume that 

P»W ~ X3 Y is — 0* 

10 i 

In addition, there exists a known set of variates t M such that 

7** = M*, X) * 0- 

z 

That is, the column constants have a linear regression on a set of known numbers. 
The following are the maximum likelihood estimates of the relevant constants. 

b,( = £ 

n~l 


E„ - E *«,.{**» - 

£ <.(* < 10 * - £ b xi x t .,) 




_ vr,z 


£ti 


where 



174 


W. G. COCHRAN AND C. I. BLISS 


In the notation used for numerical calculation, 


I _ (<U — ^ d$) __ di 

the quantity X*., being the column totoZ. 


where 

Finally 


4 - £<.X ( ., 

v 


rc^,/.{ = fi’i/e = Eij - £ 
s 


The distributional properties are similar to those in the two-population case. 
The quantities Ea follow a Wishart distribution ith (rc — r — 1 — fc) d.f. 
and covariance matrix . The variates d< follow a multivariate normal dis¬ 
tribution with means r5,-2/* and covariance 


v t i ((r2Zi + 2^’ d$ d,) = vai,^ (say). 


Consequently, 


y = 2£7° « d: d; 


is distributed as vpF/(rc — r — p — k) with p and (rc — r — p — k) d.f. and 
parameter 

r = i(r2£)*2(r <y ’*M,>. 


Thus in the numerical example, with r = 12, c = 4, p = 2, & = 1, this procedure 
would have given an F test of the null hypothesis r = 0, where F has 2 and 33 
d.f. However, the contribution from 2 degrees of freedom was deliberately 
omitted from the quantities Ea , so that F actually had 2 and 31 d.f. 


PART III 

18. Justification of the ‘dummy variate’ approach. It remains to show that 
the method of calculation used in the example (sections 5 and 6) is equivalent to 
that derived from theory. There are two chief points to prove. First, that the 
M’s found from the equations 

(30) £ = d'i 

j 

aire proportional to the corresponding L’s found from the equations 

(31) £ SiiL, = d, 

a 

where the suffix a denotes summation over both and variates. 

Now, since Su — Ea + d< d,/240, equations (31) are the same as 

(32) £ EijLj = *(l - £ Li dj/ 240). 


Hence the L’s in (31) are proportional to the values found from the equations 
(33) £ E (j L'i = di . 



DISCRIMINANT FUNCTIONS 


175 


But it is well known that if the h\ are eliminated one by one from equations (33), 
we obtain 


E E tH L', 


di, 


which is the same as (30). This proves the first point. 

The second point to establish is that the F test in the example is the same as 
that obtained from theory. In section 15, it was shown that 

(34) £ E<i ( d < d 'i/ v 


is distributed as pF/(n — p — k + 1 ). In the analysis of variance of Table 6 , 
section 6 , the quantity folio dng the same distribution was 


(35) 

where 


(S a ~ S() 
(240 - So) ’ 


Sa = £ S iJ di d,-, S( = £ *s £ ” d ( d,. 


Since equations (31) and (32) have the same solution, we must have 
S ' 7 = E ij { 1 - £ Lid,/ 240) = I - S o /240). 

a 


Multiplying both sides by di dj and summing over all t, j, we obtain 
Sa = E a ( 1 - Sa/ 240) = E a ( 1 + E a / 240), 
where E a is defined analogously to S a . Similarly 

Si - Ei/( 1 + Ei/ 240). 

Hence 

S a ~~ Si __ E a — Ei _ E a — Ei 
Ki ' 240 - S a 240 + Ei v 

Transform the variates Z {, xt into variates x \, Xi , where = Xi — 26,^ . 
It is easy to see that this transforms 

£ E (i di di into £ E u d ( d, + £ E 1 * * d\ d’j. 

a i.j 

That is, 

£. = £’{ + £ E ii ( d'i d'j, 


since the quantity on the left is invariant under non-singular linear transforma¬ 
tions. Hence from (36), 


(Sa ~ Si) 
(240 - Sa) 


£ d'i d'i/v. 
i.f 



176 


W. G. COCHRAN AND C. I. BLISS 


From (34) and (35), this establishes the equivalence of the V tests. While the 
proof has been given only for the type of data encountered in the example, the 
same method will apply to other types of data. 

In conclusion, we wish to thank the referees for many helpful suggestions in 
connection with the presentation of this paper. 

REFERENCES 

(1J R. A. Fisher, “The use of multiple measurements in taxonomic problems, ,, Annals 
of Eugenics, Vol. 7 (1936), pp. 179-188. 

[2] R. A. Fisher, “The statistical utilization of multiple measurements," Annals of 
Eugenics , Vol 8 (1938), pp. 376-386. 

(31 H. Hotelling, “The generalization of Student's ratio," Annals of Math . Stat., Vol. 
2 (1931), pp. 360-378. 

[4] P. C. Mahalanobis, “On the generalized distance in statistics," Proc. Nat. Inst. 

Set. Ind., Vol. 12 (1936), pp. 49-55. 

(5] C. I. Bliss and II. P. Marks, “The biological assay of insulin," Quart. Jour. Pharm. 

and Pharmacol ., Vol. 12 (1939), pp. 82-110; 182-205. 

[6J P. L. llsu, “Notes on Hotelling’s generalized T,” Annals of Math. Stat., Vol. 9 (1938), 
pp. 231-243. 

[7] H. Cramer, Mathematical methods of statistics , Princeton University Press, 1946. 

[8] M. S. Bartlett, “On the theory of statistical regression," Roy. Soc. Proc. Edin. t Vol. 

53 (1933), pp. 271-277. 

[9] S. S. Wilks, Mathematical Statistics, Princeton University Press, 1943, p. 70. 

[101 P. C. Tang, “The power function of the analysis of variance tests," Stat. Res. Memoirs , 
Vol. 2 (1938), pp. 126-157. 



ON THE KOLMOGOROV-SMIRNOV LIMIT THEOREMS FOR 
EMPIRICAL DISTRIBUTIONS 

By W. Feller 
Cornell University * 

Summary. Unified and simplified derivations are given for the limiting forms 
of the difference (1) between the empirical distribution of a large sample and the 
corresponding theoretical distribution and (2) between the distributions of two 
large samples. 


1 . Introduction. Let Xi , * • • , X N be mutually independent random vari¬ 
ables with the common cumulative distribution function F(x). Let X\ , • • • , 
Xl be the same set of variables rearranged in increasing order of magnitude. 
The empirical distribution (or sum-polygon) of the sample X \, * • • , X N is the step 
junction S N (x) defined by 

f 0 for x < X* 


( 1 . 1 ) 


Sj»(x) = j ~ for Xt <*< Xt +i 


1 for x > X *. 


In other words, N- S N (x) equals the number of variables X> which do not exceed 
x . We expect intuitively that S N {x) —> F(x) as N —» oo. In fact, if this were 
not so the notions of distribution and sample would be meaningless. The so- 
called ^-criterion of von Mises [4] provides rough estimates for the probable 
deviations of S*(x) from F(x) for certain forms of F(x) (cf. von Mises [4]). A 
much stronger result is due to A. Kolmogorov and is of great interest in the 
theory of non-parametric estimation (Kolmogorov [3]). The maximum of the 
deviation | S N (x) — F(x) | is a random variable D N whose distribution is easily 
seen to be independent of the special form of F(x) provided only that F(x) is 
continuous. 1 The exact distribution of D N is not known, but Kolmogorov found 
that N*D n has a limiting distribution. More precisely we have 
Theorem 1 (Kolmogorov [1]). Suppose that F(x) is continuous and define 
the random variable D N by 

(1*2) D N * l.u.b.j S N (x) - F(x) |. 


•Research under an ONR contract. 

1 This fact will not be used explicitly in the sequel but follows as a byproduct from our 
proofs. A simple direct proof consists in considering the random variables £* - F(X*) 
which are uniformly distributed; the maximum deviation Du of the empirical distribution 
of the new sample )£*) from the uniform distribution has the same distribution as Dn\ 
cf. Kolmogorov [1]. 


177 



178 


W. FELLER 


Then for every fixed z > 0 as N —> <» 

(1.3) Pr (D w < 2 iV‘) — L(z) 

where L (z) is the cumulative distribution function which for z > 0 is given by either 
of the following equivalent relations 2 

(1.4) L(z) = 1 - 2 £ (-1 r* e = (2r)‘ S - ± 

»-l r-1 

For z < 0 we have, of coiirs n y L(z) = 0 . 

Equally interesting is Smirnov’s result concerning the maximum difference 
between the empirical distributions of two samples with the same cumulative 
distribution. 

Theorem 2 (Smirnov [5]). Let {X x , • • * , X m ) and (Y \, • • • , Y n ) be two sam¬ 
ples of mutually independent random variables having a common continuous dis¬ 
tribution F(x). Let S m (x) and T n (x) be the corresponding empirical distribution 
functions and define a new random variable Z) m , n by 

(1.5) D m , n - l.u.b.| S m {x) ~ T n (x) |. 

Put 


( 1 . 6 ) 


mn 

m + n 


and suppose that m —> , n —> cc so that 


where a is a constant. Then for even) fixed z > 0 
( 1 . 8 ) Pr { I ) m , n < zN-*} -+ L(z), 

where L(z) is the same as in (1.4). 

The original proofs (Kolmogorov [ 1 ] and Smirnov [ 6 ]) are very intricate and 
are based on completely different methods. Kolmogorov’s proof is based on an 
auxiliary theorem of equal depth proved in a separate paper (Kolmogorov [ 2 ]). 
An alternative proof of Kolmogorov’s theorem is due to Smirnov [ 5 ], However, 
Smirnov derives both theorems as corollaries to much deeper (but less useful) 
results concerning the number of intersections of the graphs of S N (x) and F(x) zfc 
eiNT* and of S m (x) and T n (x ) respectively. It is, therefore, not 

surprising that Smirnov’s proofs require a powerful technique and many auxiliary 
considerations. It is the purpose of the present paper to present unified proofs 
of the two theorems which are based on methods of great generality . 8 The new 


* The equivalence of the two formulas in (1.4) is a well-known relation often called trans¬ 
formation formula for theta-functions. We shall only prove the first representation in 
(1.4). The second is more useful for small z. A table of L{z) is given in Smirnov [6]. 
It is reprinted in the present issue of the Annals of Mathematical Statistics (pp. 279-281). 

* Among other results which can be proved by the same method are certain limit theo¬ 
rems for ruin and first-passage time problems in the theory of diffusion and random walks. 



ON THE KOLMOGOROV-SMIRNOV THEOREMS 


179 


proof is not simple but simpler than the original ones. At any rate, it requires 
essentially only routine manipulations with generating functions and their 
limiting form, the Laplace transforms. However, the paper aims mostly at a 
unification of methods. 

As a byproduct of the proof we obtain 

Theorem 3. Let A N be the number of points x where the step-polygon Sn(x) 
of Theorem 1 leaves the strip F(x) it zN~K The expected value of the random varu 
able A n satisfies the asymptotic relation 

(1.9) E(A n ) ~ 2(2riV)‘{l - <*>(23)}, 

where 4>(z) is the normalized Gaussian distribution. 

An analogous corollary to Theorem 2 was given by Smirnov [8]: formula (1.9) 
holds also for the number of intersections of the graph of S m (x) with the step- 
polygons T n (x) ± zN~\ These results should come as a surprise to most statis¬ 
ticians. According to Theorem 1 there is a positive probability that An = 0 
and nevertheless E(A N ) is of the order of magnitude N k . The explanation 
lies in the fact that if S N (x) crosses the curve F{x) + zN~* at some point then it 
is extremely likely that S(x) will in some neighborhood continue to fluctuate 
around values F{x) -j- zN~\ crossing that curve a great many times. The differ¬ 
ence Sn(x) — F(x) exhibits, in the limit N —» oo, many small oscillations. This 
phenomenon is related to the well-known fact that the path of a particle subject 
to the Einstein-Wiener diffusion process has no derivatives. 

Instead of the absolute values of the differences we may consider the differ¬ 
ences themselves and derive two parallel theorems for the maximum and the 
minimum. As an example we shall prove 

Theorem 4. With the notations and assumptions of Theorem 1 let 


(i.io) 

Dt = l.u.b.{S*(x) - F(x )}. 

Then 


(l.H) 

Pr{Z>£ < zN~*) -> 1 - of 2 ' 5 


The proof is simpler than that of Theorem 1 but uses the same method. 

2 . Notations and preliminary remarks. For printing convenience it is desir¬ 
able to avoid complicated subscripts and we shall therefore use the following 
notation for binomial coefficients 

(2.1) C(n, *)=(”)• 

Similarly, for the general term of the binomial distribution we shall write 

(2.2) B(n, k;p) = C(«, fc)p*(l - p)"~\ 



180 


W. FELLER 


If A is an event, A will denote its negation (complementary event). Finally 

(2.3) Pr{A\B] 

denotes the conditional probability of A for given B . 

Our proofs depend on a special case of the continuity theorem for charac¬ 
teristic functions. Since we shall deal only with probability density functions 
f(t) which vanish for t < 0 it is preferable to use, instead of the characteristic 
function, the Laplace transform 

(2.4) *(«) - f e-«f(t) dt. 

Jo 

(This amounts to using the variable — s instead of the usual is and therefore 
4>(s) obeys the formal rules for characteristic functions.) 

For any sequence {w*) (Jc = 1 , 2 , • • •) of non-negative numbers we define the 
generating function u(\) by 

(2.5) u(A) = uic A*. 

k** 1 

Now let 8 > 0 be fixed and consider the step-function/ 3 (0 defined by 

(2.6) f 6 (t) = u h for (k - 1)5 < t < k8 

(k = 1, 2 , • • • ; /a(0 = 0 for t < 0). Its Laplace transform is 

(2.7) 0 ,(s) = u(e~ u ). 

We have, therefore, the continuity‘theorem: If,as8—> 0 , 

(2.8) 8u(e- s ')-><t>(s), 
then for every fixed t > 0 

( 2 . 9 ) Uu —> /(/) when k8 —> t; 
conversely , t/ (2.9) holds then (2.8) is true . 

3. Proof of Theorem 1 . Since F(x) is continuous it is possible to define num¬ 
bers Xk such that 

(3.1) F(x t ) = i , (fc - 1, 2, • • • , N - 1). 

This definition is unique except when F(x) — k/N within an entire interval, in 
which case we define Xk as the left endpoint of that interval. 

Let c > 0 be an integer. We shall evaluate the probability of the event 
D N > c/N and we shall later put 

(3.2) c = zN\ 


AT —► oo. 



ON THE KOLMOGOROV-SMIRNOV THEOREMS 


18 t 


Suppose first that for some particular x 
(3.3) S N (x) - F(x) > ± . 

This point x is contained in a maximal interval in which (3.3) holds and at the 
right endpoint £ of this interval we shall have 


( 3 . 4 ) S„(() - - jf • 

Now Sff(£) is necessarily a number of the form r/N with an integer r. Since c 
is an integer also F( £) = k/N and hence £ = x* for some k. From (3.4) we 
conclude that 

( 3 . 5 ) Xk+* < Xk , X*+e+l > Xk 

or in other words: exactly He among the N variables X, are smaller than x* . 
Denote this event by A k (c). The inequality (3.3) takes place for some x if, and 
only if, at least one among the events A x {c), • • • , A N (c) occurs. The argument 
applies equally to c < 0 and shows that the event D N > c/N occurs if, and only if, 
at least one among the events 

(3.6) 4i(c), Ai(—c), A 2 {c) t A 2 (-c), • • • , Ajsr(c), 4*(-c) 


occurs . 

Let Ur and V r be the events that in the sequence (3.6) the first event to occur 
are A r (c) or A r (— c), respectively. More formally, the events U r and V r are 
defined by 

Ur = il(c)ii(- C ) • • • A r ^c)Ar-l(-C)A r (c) 

^ Vr = Ii(c)ii(-c) •. • A r ^{c)A r ^c)A r {c)Ar{-c). 


These events are mutually exclusive and therefore 

(3.8) Pr { D„ > £ j = £ Pr [U,\ + £ Pr [V r \. 

From the very definitions we have the following two fundamental relations 
Pr {A*(c)| - Z Pr {Ur} Pr |A*(c) | A r (c){ 


(3.9) 

Pr |A»(-c)} 


+ i Pr {7,} Pr U*(c) | ^ r (-c)} 

r»i 

Pr {Ur} Pr (At(—c) | 4,(c)j 

K 


+ Z Pr {Vr} Pr {A*(—c) | ji,(-c)|. 



182 


W. FELLER 


This is a system of 2 N linear equations for the 2N unknowns Pr {t/ r } and 
Pr \ V r } and we proceed to solve it by the method of generating functions. 

By definition of Xk we have Pr {X f < x k ) = k/N. The probability of the event 
At(c) (that the same inequality holds for exactly k + c different v’a) is therefore 
given by 

(3.10) Pr }A*(c)} - B(N, k + c ; k/N) 

(cf. (2.2)). Similarly, it is readily verified that for r < k 

(3.11) Pr \A k (c) | A r (c)\ = B(N — r — c, k — r; (k - r)/(N - r)). 
and 

(3.12) Pr {A k (c) | A r ( - c)} = B(N — r + c, k — r + 2c\ (k — r)/(N - r)). 

The last three equations hold also for c < 0. They can be written in a more con¬ 
venient form in terms of the quantities 

(S.iS) Pt(c ) = 

In fact 

(3.14) Pr \A k (c)l = 

(3.15) Pr {A*(c) | A,(c) J = M°Wr c ) 

Pn-t(-c) 

(3.16) Pr { A k (c) | A,(— c)| = P*-r(&)p»-*(_-e) 

PiV-r(c) 

If these expressions are introduced into (3.9) the second factor in the numerator 
cancels. A further simplification is achieved on introducing new sets of un¬ 
knowns 

(3.17) Ur = Pr {U T \ v T = Pr \ V,\ . 

PN-r\ — C) Ps—rifi) 

The fundamental equations (3.6) then reduce to 

k k 

Pi(c) = 'El Ur Pk- r(0) + El ®r Pk-r(2c) 

r-1 r-1 

(3.18) 

k k 

Pk( — C ) — El UrPk-,(-2c) + El Ur Pfc-r(O). 

r-1 r-1 

This system is of the convolution type and can therefore be solved by means 
of generating functions. The essential point is that the p k (c) are defined for 
all k and that the system (3.18) therefore determines the unknowns u r and v r 
for all r > 0. We put 

ti(X) * £ wjbX* v{\) = Zt'tX* 

k-i k~i 


(3.19) 



ON THE KOLMOGOROV-SMIRNOV THEOREMS 


183 


(3.20) p(X; c) = N~ k £ p*(c) X* . 

JU1 

(The factor iV * serves to simplify formulas.) Then obviously 
p(X;c) = u(\)p(\;0) + v(\)p(\;2c); 

(3.21) 

p(\; c) = ti(X)p(X; -2c) + y(X)p(X;0). 

From here we find u (X) and v (X). Equation (3.17) then determines Pr { U r } and 
Pr [V r \. Actually we are interested only in the two sums occurring in (3.8). 
We put 

(3-22) 5 ph ~ r{ ~ c)Ur • * = ^o) § p *-' (c)Vr * 

Again, the £* and 17* are defined for all /c (also A; > N). From (3.17) we have 

(3.23) EPri[/ r j =?*■ E Pr {F,i = y*, 

r-1 r-1 

and hence finally, by (3.8) 

(3.24) Pr { D n > c/N } = . 

In (3.22) we find again simple convolutions leading to products of the corre¬ 
sponding generating functions. Thus 

(3.25) . 

,(X)-g,.X - p, ( 0) • 

We now pass to a study of the limiting form of these generating functions 
as N —► 00 and c —» 00 in accordance with (3.2). Consider a fixed t > 0 and sup¬ 
pose that 


From well-known properties of the Poisson distribution it follows then that 

(3.27) N'p^c) — (2ir0"‘exp(- * J /2!<)• 

Accordingly, the continuity theorem of section 2 implies (as can be verified dir¬ 
ectly) that 

(3.28) p(e ""' 5 ^ f r> 6XP ( ~ iS ~ Z ’ /2 ° di 

= (2«)“* exp(— (2sa*)*). 



184 


W. FELLER 


(the last integral is well known and can be evaluated by elementary methods; 
the square-root is always positive). We see in particular that the limiting form 
is the same for p(A; c) and p(\; — c). It follows therefore from (3.21) directly 
that 


(3.29) 


lim u(e~' ,N ) = lim v(e * tN ) = 


exp ( — ( 2 $:*)*) 


1 4 - exp ( — (8$z 2 )*) * 

Using this and the fact that p N ( 0 ) —► (2jtA 0~* we conclude from (3.25) that 
lim IT 1 &-•'*) = lim IT 1 n(e' ttN ) 

N—*ao N— 00 

(3.30) 


(!) 


exp (- ( 8 sr) u ) 

1 + exp (- (8 s3*)‘«) 


0 (s). 


Expanding 4>(s) into a geometric series we get 

(3.31) 0 (s) = (gf g (— l )" 1 exp (- ( 8 s,’a 1 )" 1 ). 

From the evaluation of the integral in (3.28) we conclude that <£(«) is the Laplace 
transform of 


(3.32) fit) = E (-lr 1 exp (-2„V/i). 

r-1 

The continuity theorem of section 2 in conjunction with (3.30) and (3.26) shows 
that 

(3.33) lim = lim rj * = /(l). 

JV—oo AT—co 

In view of (3.24) this accomplishes the proof. 


4. Proof of Theorem 4. This proof is simpler than the preceding one inas¬ 
much as we are now interested only in the events A k (c) for c > 0. This time we 
define U r as the event that k is the smallest subscript for which A k (c) occurs, that 
is, U r = Ai(c)A 2 (c) • • • A r _i(c)A r (c); no analogue to the event V r will be used. 
With the same notations as before (3.9) is replaced by 

(4.1) Pr U*(c)} = E Pr { U r ) Pr U*(c) | A r (c)\, 

r—1 

and hence (3.21) by 

(4.2) p(X; c) = u(A)p(X;0). 

Here p(A; c) is the same as before, so that (cf. (3.29)) 

(4.3) lim u(e~* ,N ) = exp (— ( 2sz *) m ). 

N—to 

Again, the first equation (3.25) holds without change and therefore we get in¬ 
stead of (3.30) 

lim N-'tie- 1 *) - exp (- ( 8 *^ w *). 


(4.4) 



ON THE KOLMOGOROV-SMIRNOV THEOREMS 


185 


From (3.28) this is the Laplace transform of 
(4.5) /«) = T* exp (- 2:7 1). 

As before we conclude that —► /(l), which accomplishes the proof. 

5. Proof of Theorem 3. We have seen in section 3 that the intervals in which 

(3.3) holds are in a one-to-one correspondence with the events A*(c). Hence 

(5.1) E(A n ) = 2Pr {A k (c )} +2Pr {A*(-c)). 

To evaluate the sums we use (3.10). If N —► and again c = zN* t k/N —» t , 

then by the central limit theorem 

<*» * + "S ■ 

It follows then from (3.10) that 

(5.3) AT^S Pr {A*(c)} -> (2irT in [' {((1 - 0r 1/2 exp (~z 2 /2l(l - 0 ) dL 

Jo 

Call the right hand member R . After the substitution t — sin 2 (0/2) we find 

^ = — 8(27r) _1/2 s f sin” 2 0 exp ( —2j 2 /sin 2 0) d<f> 
az Jo 

/•W 2 

(5.4) = 8(2t r)” 1/2 s exp (—2: ,a ) / exp (— 2z 2 cot 2 0) d (cot 0) 

Jo 

= —2 exp (—2 j 2 ). 

Since R —* 0 as z —> a> we conclude that 

(5.5) R = 2 exp ( -2x *) cte = [1 - $(2;)} (2jr) w *. 

The same asymptotic estimate holds for the other sum in (5.1), and hence Theo¬ 
rem 3 is proved. 

6 . Proof of Theorem 2. Reorder the two samples in ascending order 
of magnitude and denote the rearranged samples by (X? , • • • , XZ) and 
(7* , • * • , Ft). When speaking of the graphs of the empirical distributions 
S m (x) and T n (x) we shall suppose that they have been completed by adding 
vertical segments so that the graphs become step-polygons. We shall put 


v 7 I i 

771+71 

Then, according to. (1.6) and (1.7) 


m + n 


- o, N *= pn ■» 577i, 
<1 



186 


W. FELLER 


Without loss of generality we shall suppose that 

(6.3) v ^ Q- 

In order to carry over the proof of Theorem 1 it is necessary to define the 
events A k (c) in a judicious manner. For every integer k > 0 let v k be the num¬ 
ber of variables X, which are smaller than Y k . In other words, vt is defined 
as the integer for which 

(6.4) X* k < Yt < X*+x. 

Finally put 

' *-[?]-&»] 

where, as usual, [s] denotes the greatest integer contained in x. 

For 0 < k < n let A k (c) be the event that 

( 6 . 6 ) vk — a k + c • 

The possibility of applying the proof of section 1 depends on the following 
Lemma. Whenever 


(6.7) D m . n > - > 0 

n 

then at least one among the events A x (c ), A\(— c), • • • , A n (c), A n (—c) occurs. 
Conversely , if one of these events occurs then 

(6.8) D m , n > (c - l) /«. 

Proof. If (6.7) holds then either for some x 0 

(6.9) S n (x 0 ) - T n (x o) > - 

n 

or the reversed inequality holds with c replaced by — c. It suffices to consider 
the case (6.9). For sufficiently large x we have S m (x) — T n (x) = 1 . Hence the 
graphs of S m (x) and T n {x) + c/n must intersect at an abscissa ( > x 0 . The 
point of intersection lies necessarily on a horizontal segment of the graph of 
S m (x) and a vertical segment of T n (x) + c/n. Hence there exists a k such that 
{ = Y* and, moreover, 


( 6 . 10 ) 

This amounts to saying that 
( 6 . 11 ) 


Tu<& -) +1 < 8J& < r.ft +) + f 
n n 


k — 1 + c ^ vt ^ k + c 
n m n 

In view of (6.3) and (6.5) this relation implies (6.6). 



ON THE KOLMOGOROV-SMIRNOV THEOREMS 187 

Conversely, suppose that the event Ak(c) occurs and let c > 0 . Put again 
t = Y*. Then, by definition, 

(6.12) = -* - ^±5, r n ({) = 

mm n 

It follows that 

(6.13) *„.(*) > —- - - = r„(i) + - - -, 

n m n m 

which in turn implies ( 6 . 8 ). This proves the lemma. 

Theorem 2 is concerned with values of c such that cn~ l = zN~*\ in passing to 
the limit we must therefore put 

(6.14) c — z(n/p)K 

Accordingly, the relations (6.7) and ( 6 . 8 ) arc asymptotically equivalent and our 
lemma shows that, asymptotically, the probability of (6.7) is the same as the 
probability that at least one among the events Ai(c) y ••• , A N (—c) occurs. 
To evaluate this probability we proceed exactly as in section 3. The events 
U r and V r defined by (3.7) and the fundamental relations (3.9) hold again. 
However (3.10) — (3.12) have to be replaced by new evaluations. 

It is easily seen that the probability that exactly r among the X p are smaller 
than Yt is the same as the probability to extract exactly r white balls before the 
&-th black ball from an urn containing m white and n black balls (assuming 
that all orders are equally likely and that balls are not replaced). In this way 
one finds 

(6.15) Pr (/l*(c)) = 6 - at+c ~t - - ±* ~ ft 

C(m + ft, ft) 

Pr |A*(c) | Ar(c)} 

(6.16) _ C(flk+c — «r +c + k — r—l,k — r— l)C(tn + n — q^+c — k, n — k) 

C(m + n — a r+e — r,n — r) 

Pr{A t (c)IA r (-c)) 

(6.17) _ C( at+. — a r -c + & — r — l,fc — r — 1 )C(m + n — ak+ c 

C(m + n — a r - c — r,n — r) 

The second binomial coefficient in the numerator is common to the three ex¬ 
pressions and cancels when the expressions are introduced into (3.9). These 
fundamental relations assume a more natural form if the occurring binomial 
coefficients are enlarged to terms of a binomial distribution. It is easily veri¬ 
fied that the first of the equations (3.9) reduces to 

g(q t+J +& — \,k — 1; g) 

B(m + n,n\q) 

y' P , TT , B(q*+. — Qr+» + — r — 1, fc — r—l;g) 

r) B(m + n - a r +c - r,n - r;q) 

, V Pr tv . Bfau - Qr-c + k - r - 1 ,k - r - l;g) 
’’’ TA. n B(m + n - gy_ - r, n — r; q) 


( 6 . 18 ) 



188 


W. FELLER 


The second equation is obtained on replacing the combination k + c by 
k — c. 

Instead of (3.17) we put 


(6.19) 


u r = Pr [U r \ 

Vr = Pr {Vr} 


B(m + n, n; q) 


B(m + n — a r+c — r, w — r; g) 

_ B(m + n, n; g) _ 

J3(m + n — o r _ c — r, n — r; g) * 


Then (6.18) becomes 


( 6 . 20 ) 


#(a*+c + k — 1 , & — 1; g) 

= 2 UrB(a k+0 - a r+c + A- - r - 1, A; — r - 1; g) 


+ Z v r B(a k+e - a r _ e + A- — r — 1, A; — r — 1; g). 

r«l 


This corresponds to the first equation in (3.18). Unfortunately (6.20) is not 
of the pure convolution type since a k + c — a r \. c and a k + c — a r _ c are not functions 
of the two variables k — r and c. The trouble comes from the fact that a k , 
as defined by (6.5), is not a linear function of k. It is, however, plausible that 
we shall commit only an asymptotically negligible error if we omit the brackets 
in (6.5), that is, if we replace a k by pk/q. Purely formally (6.20) then reduces 
to the first equation in (3.18) with 


( 6 . 21 ) 


Me) - 1; g). 


(Here the first argument in the right hand member is no longer necessarily an 
integer, and the factorials in the definition (2.2) should be interpreted by means 
of the gamma function.) To the new system (3.18) the considerations of section 
3 apply almost word for word: the only difference lies in the new norming (6.14) 
(which replaces (3.2)) and that instead of (3.26) we shall naturally let k/n —* t. 
Thus the limiting form of Theorem 1 applies to the new system (3.18) with p k (c) 
defined by (6.2). 

It remains to prove that the formal replacement of (6.20) and the corre¬ 
sponding equation for — c, by (3.18) was legitimate. Now all coefficients 
in (6.20) are of the form B(v, r; g), and we have only changed the first argument, 
v , adding a variable quantity which in no case exceeds one unit. In passing to 
the limit we put k ~ in and c ~ zn*p~*. It follows that we actually use only 
coefficients B(v , r; g) where v —► «o, r —> « and v/r —> g. Accordingly, for 1| 
< 1 we have B(y + $, r; q) ~ B(v> r;q ), and it is rather obvious that our system 
(6.20) is asymptotically equivalent to (3.18). 



ON THE KOLMOGOROV-SMIRNOV THEOREMS 


189 


REFERENCES 

[1] A. Kolmogoroff, “Sulla detcrminazione empirica di una legge di distribuzione,” 

Inst. Ital. Attuariy Oiorn ., Vol. 4 (1933), pp. 1-11. 

[2] A. Kolmogorov, “Uber die Grenzwerts&tze der Wahrscheinlichkeitsrechnung,” Bul¬ 

letin [ Izvestija ] Academie des Sciences URSS t (1933), pp. 363-372. 

[3] A. Kolmogoroff, “Confidence limits for an unknown distribution function/* Annals 

of Math. Stat.y Vol. 12 (1941), pp. 461-463. 

{4) R. von Mises, Wahrscheinlichkeitsrechnung , F. Deuticke Leipzig und Wien, 1931, p. 316 
seq. 

[6] N. Smirnov, “Ob uklonenijali empiriceskoi krivoi raspredelenija,** Recueil Mathe - 
matique (Matematileskii Sbornik ), N.S. Vol. 6(48) (1939), pp. 3-26. 

[6] N. Smirnov, “On the estimation of the discrepancy between empirical curves of dis¬ 
tribution for two independent samples,** Bulletin Mathbmatique de VUniversity 
de Moscou t Vol. 2 (1939), fasc. 2. 



APPLICATION OF RECURRENT SERIES IN RENEWAL THEORY 

By Alfred J. Lotka 
Metropolitan Life Insurance Company 

Summary. The application of integral equations to renewal theory in popu¬ 
lation analysis and problems of industrial replacement is beset with certain diffi¬ 
culties which have been particularly discussed by W. Feller (these Annals 1941 
vol. 12 pp. 243-267). Some of these difficulties are avoided if the data of the 
problem are introduced into the analysis directly in the discontinuous form 
(tabulated by class intervals) in which they are usually supplied in a concrete 
case. A numerical example based on population statistics is presented, illustrat¬ 
ing how, using discontinuous data, a recurrent series takes the place of the integ¬ 
ral equation, and a finite exponential series appears in place of the Heaviside 
expansion of the previous solution. There is close analogy with the procedure 
previously presented, but with factorial moments appearing in place of ordinary 
moments. 

The fundamental data being given for values of the replacement function at 
discrete intervals only, some question arises as to the applicability of the solution 
as an “interpolation” formula for non-integral values of the time t , and as to the 
effects of subdividing the class interval of the original data. 

In the actual computation of the factorial moments a shift of origin by one- 
half class interval becomes necessary. An algorithm for effecting this shift is 
presented. 

1 . Methodology: Alternatives Available. 

All application of mathematics to concrete situations involves a greater or less 
degree of conventionalisation, a substitution, “in place of intractable reality, of 
an ideal upon which it is possible to operate.” 1 

This conventionalisation may be only such as to do little violence to the con¬ 
crete data, as for example when, dealing with a large population, we treat the 
number N(t) of individuals at time t as a continuous variable, knowing perfectly 
well that strictly speaking it varies by jumps of one unit at a time. 2 

In dealing with any particular concrete case there may be considerable choice 
as to the mode in which the conventionalisation or idealisation is carried out, 
and the particular place or step in the scheme at which it is introduced. A good 
illustration of this is met in the treatment of renewal theory, as applied to human 
populations or other biological or industrial aggregates. 

The majority of authors who have dealt with the subject have set up their 
fundamental equations in terms of continuous variables. Many have gone fur- 

» Nature , Vol. 110 (1922), p. 764. 

*If the population is subject to extreme variation in numbers, such that N(t) passes 
through small values, this disregard of their discontinuity may not be permissible. 

190 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 191 

ther than this in the process of conventionalisation and have assumed for the 
renewal function (net reproductivity) some more or less appropriate mathemat¬ 
ical expression, such as a Charlier or a Pearson [1] frequency distribution, and 
have, wherever possible, carried out by standard methods the integrations in¬ 
volved. 

Others, while retaining the formulation of the fundamental equations in con¬ 
tinuous (infinitesimal) form, have made no specific assumptions regarding the 
analytical form of the renewal function, and have carried out the numerical in¬ 
tegration by one of the established methods available for the approximate in¬ 
tegration of arbitrary functions. 

But there has also been a minority of authors who deemed it most appropriate, 
since the data of the problem are actually furnished in tabular (and hence dis¬ 
continuous) form, to apply from the start discontinuous methods in formulating 
the fundamental equation for the problem. This equation then defines a recur¬ 
rent series. 

The most recent and also the most concise exposition of this approach to the 
problem is a paper by W. Dobbernack and G. Tietz presented at the Twelfth 
International Congress of Actuaries, 1940, Proceedings , vol. 4, p. 233. These 
authors, however, do not give any numerical application, and in consequence 
certain aspects of the analysis are not touched upon by them. A more detailed 
presentation, including numerical applications, was given by the late S. D. Wick- 
sell 3 who, however, used only roughly approximate data (an over-all average net 
reproductivity for ages 20 to 44) and also introduced certain linear interpolations 
which would not be appropriate with more exact data, and which become un¬ 
necessary in the numerical operations if moments are introduced as indicated in 
what follows. 

The purpose of the present paper is to exhibit this modification of the method 
of recurrent series, and at the same time to illustrate its relation to the method 
which proceeds in terms of a continuous variable, leading to an integral equation. 

The B(t — a) women born in the calendar year (t — a), that is, between the 
times (t — £ — a) and (t + £ — a), will be a years old some time during the 
calendar year t, that is, between i — | and t + J. If their births were evenly 
distributed over the year t — a, so will their birthdays of age a be over the year 
ty and their average age during that year will be a and the average number of sur¬ 
vivors to that age during the year t will be approximately B(i — a)p(a), where 
p(a) is the probability, at birth, of surviving to age a. If the annual female 
reproductive rate, counting daughters only, is m(a ) at age a, then the B(t — a)- 
p(a) survivors will, during the calendar year t , give birth to B(t — a)p(a)m(a) 
daughters. If B(t) is the total number of births of daughters in the calendar 
year t, then evidently, for positive values of t, 

(1) B(t) o)p(o)»i(o), 


12]; see also [3]. 



192 


ALFRED J. LOTKA 


or, to simplify the notation, 

(2) Bit) = £ c a B(t - a). 

1 

Equation (1) or (2) defines a recurrent series of the general form 

(3) 5(£) * C\B(t — 1) + cuJB(t — 2) + • • • + — «), 

where some of the coefficients c may be zero and where w denotes the upper limit 
of the reproductive period. 

The trial substitution 

(4) B(t) - Qx- 
in (3) gives 

(5) 1 = C\x + C 2 X 2 + • • • + CuX 

The substitution (4) therefore satisfies (3) provided that a; is a root of the equa¬ 
tion (5) of degree w for x; and the same is evidently true for the more general 
substitution 

(6) Bit) = £ Q,x 7‘, 

J-l 

where Xj , with j = 1, 2, • ■ • w, are the o> roots of (5). 

Equation (5) leaves the w coefficients Q, indeterminate. In general they ap¬ 
pear as arbitrary constants. In any concrete application they may be deter¬ 
mined by “initial” conditions; that is, in order to make the problem determinate, 
it is necessary to be given the values of B(t) for w successive integral values of t, 
or some equivalent data. 

While, for convenience in description, the analysis has been developed in 
terms of the year as time unit, the formulae are evidently independent of this 
choice of unit, provided that the unit employed is adequate for practical appli¬ 
cation. 

Whatever the unit employed, for the direct application of (1) and (3) to a con¬ 
crete case it is necessary to have the data in such form that values of p(a)m(a) 
are known for integral values of a. The pertinent statistics do not usually come 
in that form, the fertility being usually known only for five year age groups, and 
though it may be sufficient for practical purposes to regard these quinquennial 
values as representing p(a)m(a) for the midpoint of the group, this yields p(a)- 
m(a) for fractional values of a, as measured in five year units. We may then 
proceed as follows: putting 


( 7 ) 


x = l + y 



APPLICATION OF RECURRENT SERIES IN RENEWAL THEORY 


193 


in (5) this becomes 

1 = {ci + C 2 + C 3 + * * * + Cot] 

+ {ci + 2cz + 3 c 3 + 


( 8 ) 


4- s cn + 3ca + 6c,i + 


+ 4* 4 C\ + lOcg + 

4 - * * * 

+ My u 

A -0 km*0 \ n / 

In application to a particular population, we shall usually have the condition 
c a = 0 for a = 1, 2, • • • < a 

where a is the lower limit of the reproductive period. 

The expressions in brackets (coefficients of successive powers of y) will be recog¬ 
nized as cumulations S h of the values of the function c a , summed backwards to 
the “diagonal” element c h , where h is the exponent of y. In terms of moments 
m of the function c a , equation (8) can be written 


+ uc u \y 

, cc(a ) — l) (co — 2) 
3! 


y 


/ n x , , , in 2 — nil 2 , ni3 — 3m 2 

(9) 1 = m 0 4- rn x y 4--— lJ +-yf 


2rm *, 3 

- y 4- 


+ c w2 / w 


or, using the symbol m [fl ] to denote the hth factorial moment, equation (9) takes 
the simple form 


( 10 ) 




A-o h ! 

In these expressions the moments m h and m [A ] are those taken about a = 0. 
Actually, the net reproduction rates are given for “semi-values” of a, that is, for 
values of a which are odd multiples of £ (using five years as the time unit). By 
cumulation of these given values moments m h and m[ h } about a = — £ are 
obtained. 4 From the latter the corresponding functions of the moments about 
a = 0 are obtained by the transformation formulae 6 

m[h\ _ ^ (— £) m . wfft -jfci 

h\ jfc-o 

(ID 


lc\ 


k—h ( 1\[*1 


(h - k)\' 

Sh-k 


4 In these cumulations zero values of CaforO < a < a must not be omitted. 

•In accordance with a customary notation the symbol (—§)l*l denotes the continued 
product — |(—£ — 1)(—£— 2) . . . (—£ — k + 1). In the computation of successive terms, 
in the sums in the right-hand member of (11), by appropriately laying out the work, ad¬ 
vantage is taken of the fact that values of (—£)W/fc! for k 2, 3 . . . are obtained each 
from the preceding by multiplying successively by f, $, 1, etc., and taking care of the sign, 
so that fractions with complicated numerators and denominators are avoided. 



194 


ALFRED J. LOTKA 


It will be recalled that in the treatment of the problem of replacement by 
means of an integral equation, 6 a solution in the form 

( 12 ) J3ft) - 2 Q'jXj* = 2Q' y e r ’‘, 

is obtained, in which the exponential coefficients r> are the roots of the equation 

(13) 1 =* / e~ ra p(a)m(a) da — f x a p(a)m(a) da , 
i.e. 

(14) 1 - m, - m,r +gr* - gr* + ... = g (-l)*^r\ 

in close analogy to equation (10) for y , with the distinction however that in (10) 
the factorial moments take the place of the ordinary moments of (14), and that 
the series in (10) is finite, terminating at the term in y w . There is also an impor¬ 
tant difference between the characteristic equation (13) and its analogue (5), 
namely that (5) may admit of negative roots for x, whereas (13) does not admit 
negative values for x. 

2. The constants Q. These are determined by initial conditions, as follows. 
Equation (2) can be written 


(15) 


with 


B(t) = 2 c a B(t — a) + 2 c *B(t — «) 


0—1 


0-1-1 


F(t) + 53 c a B(t — a), 


(16) 


F(t) = 53 c a B(t — a) 
and 

Fit) = 0 


0 < t < 0> 
t > « 


The values of B(t) being given for integral values of t, from t 
t = 0, it can be shown that 7 


(17) 


2 Fit)*) 

<—1 _ 

o—w 

0—1 


— («— 1) to 


• For a discussion of the limits of applicability of this method See [4J. 

7 The reasoning is essentially the same as in the treatment of the problem by integral 
equations. See [5] and [2, p. 39 et seq.]. 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 195 

In the special case that we are tracing the progeny of an initial population all 
bom at the same time, say 5(0) births occurring at t = 0, so that 

(18) B(-l) = B(- 2 )-- B(-[» - 1 )) = 0 

the expression for Qj , in view of (5), reduces to a particularly simple form. 
For if we write the summation in equation (16) in expanded form, we have 

F( 1) = 

Cl B(0) + cB(-l) + c»B(- 2 ) + c*B(-3) +•••+ c a B(-^l)‘ 
F(2) = cjB(0) + 1) + c«B(-2) + • • • + c 4 B(-^2)', 

(19) F(3) = cM 0) + ctB(-l) + ■■■+ 3 ) 


F(«) = c„B(0). 

If now B(— 1 ), • • • , B(— a) — 1 ), all vanish, then 

(20) £ F(t)x‘ = B(0){ciz + c 2 x 2 + c s x 3 + • • • + c«x"} 


( 21 ) 

Hence, 

( 22 ) 

In particular 

(23) 
so that 

(24) 


= B(0) by (5). 


< 2 ,= 


m 

a—« 

53 aCa^°j 

a—1 


B(0) = £“ Q, = B(0) £ 


/-I 






£ 

7-1 


1 

53ac a x“ 


1 . 


The constant £( 0 ) here evidently functions essentially as an arbitrary unit 
of annual births, and may with this understanding simply be put = 1 , thereby 
simplifying the notation. This has been done in what follows, where con¬ 
venient, especially in the table of constants, Table 3 of the numerical illustration. 

The denominator in (17) or (23) can be evaluated for any root Xj of (5) by direct 
summation if the coefficients c a are given or have been computed (as indicated 
below) for integral values of a; or, in a manner similar to that employed in passing 
from equation ( 5 ) to ( 8 ), the denominator can be expressed in terms of the cor- 



196 


ALFRED J. LOTKA 


responding roots y, = £/ — 1 of ( 8 ) or ( 10 ), the cumulations of c a being replaced 
by cumulations of ac a . With the denominator so expressed, the constants 
Q, take the form, in obvious analogy to equation ( 9 ): 


(25) Q, - 


mi + vviyj + 


Z <F{t) 

<-l 

m 3 — 7 t?2 2 , — 3m 3 + 2 


2! 


V/ + 


3! 




The alternative procedure, to which reference was made in the preceding para¬ 
graph, is to operate upon the moments m [fl \ (taken about the origin O) by a 
process the inverse of cumulation—which we might term decumulation —and 
in this way to obtain from them the coefficients c a . The polynomial Zac a x a j 
can then be evaluated directly. 

The decumulation is readily carried out by an algorithm which suggests itself 
from the schedule of cumulation. Analytically the relation between the two 
processes is expressed by the reciprocal sets of transformation formulae: 
Cumulation 


(26) 

Decumulation 


f -IT 


h\ 


(27) 

'•-‘S' 

3. Constants Q associated with complex roots x = e u+t \ 

The complex roots x 2 give rise to oscillatory terms which, in the special case 

of the progeny of a 

cohort of jB(0 ) births, take the form 8 

(28) 

2B(0)e- u ‘ t _ . „ 

(ft -)- //a cos vt ~ H 8in 

where 


(29) 

q—oi 

0 = S ac a e~ ua cos va 

a—1 

and 


(30) 

fl-M 

H « 2 «c 0 e~ tta sin va. 

a-l 

These constants may be evaluated directly in this form, or, putting y = i + irj 
in the denominator of equation (25), they can be expressed in terms of. the 
roots y f and the factorial moments obtained by cumulation of ac a .* 


8 The development of these formulae is analogous to that followed in the treatment of 
the problem by integral equations. See [6J; for the more general case see also [7]. 

9 The procedure in this ca3e will be analogous to that followed in the development of 
equations (90) and (91) in [6]. 



APPLICATION OF RECURRENT SERIES IN RENEWAL THEORY 


197 


(a) Numerical Illustration . For convenience and to furnish the opportunity 
for comparison, the same data (United States 1920) were here employed as in 
the writer's earlier publications in which the problem was treated by the appli¬ 
cation of an integral equation. 


(b) Cumulation for values of m h . The two operations, of (1) cumulating the 
values of c a given for semi-values of a; and (2) allowing in the cumulated results 


NET FERTILITY 
p(a)m(a) 

.35 r 



p(a)« probability for a newly-born female to 
attain age 'a' 

m(a)>probable number of female births to 
a female between ages a-V and 
(age in five year units) 


5 6 7 

AOE M S YEAR UNITS 


Fig. 1. Net Fertility p(o)m(a) White females, United States, 1920 

The verticals drawn in full and centered at mid-ages represent the original data; those 
drawn in dashed lines and centered at integral ages are interpolated. 


for a shift of origin from a = — £ to a = 0, can be conducted in one schedule as 
in Table 1. Cumulation is first carried out in the usual manner from the bottom 
line to the diagonal, with the result appearing immediately below the diagonal. 
From here on the procedure is as in the following example: Starting at the lower 
right hand comer, we find 

.00780 X (-$) = -.00390 

.12770 X (-*) « -.06385 -.06385 X (-f) = .04789 

.97395 X (-$) « -.48698 -.48698 X (-}) = .36254 

.36254 X (-f) = -.30437. 



198 


ALFRED J. LOTKA 


TABLE 1 

Computation schedule for values of —** of net productivity function p(a)m(a ) « Cm 


for integral values of age a .* 


a in 5-year 
units 

Ca 

« to) 

*»(u 

m[ 2]/21 

m[ ij/31 


m [51/51 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) i 

(7) J 

(8) 



1.16635 

6 64127 

16.64550 

24.34106 

23.16864 

15.05650 




-.58318 

.43738 

-.36448 

.31892 

-.28703 

0-1 

00000 

1 16635 

7.22445 

-3.61223 

2.70917 

-2.25764 

1.97544 

1-2 

.00000 

1.16635 

6.05810 

19.82035 

-9.91018 

7.43264 

! -6.19387 

2-3 

.00040 

1.16635 

4 89175 

13.76225 

31.90655 

-15.95328 

11.96496 

3-4 

09630 

1.16595 

3.72540 

8.87050 

18 14430 

33.62800 

-16.81400 

4-5 

.31255 

1.06965 

2.55945 

5.14510 

9.27380 

15.48370 

24.41095 

5-6 

.31025 

.75710 

1 48980 

2.58565 

4.12870 

6.20990 

8.92725 

0-7 

.23170 

.44685 

.73270 

1.09585 

1 54305 

2.08120 

2.71735 

7-8 

16090 

21515 

.28585 

36315 

.44720 

53815 

.63615 

8-9 

.05795 

.06425 

.07070 

07730 

.08405 

.09095 

.09800 

9-10 

00615 

.00630 

.00645 

00660 

00675 

00690 

.00705 

10-11 

.00015 

.00015 

.00015 

.00015 

.00015 

.00015 

.00015 

1 









a in 5-year 
units 

«[• 1/61 

m[7j/7l 

m[«]/8! 

m[ *)/9I 

m[io]/10f 

w»[u]/ll! 

Factor 

(1) 

(9) 

(10) 

(ID 

02) 

(13) 

(14) 

(15) 


_6.72500, 

| 1.99717 

.36404 

.03483 

.00127 

.00001 



.26311 

1 -.24432 

.22905 

| -.21633 

.20551 

-.19617 

-21/22 

0-1 

-1.77790 

1.62974 

-1.51333 

1.41875 

-1.33993 

1.27293 

-19/20 

1-2 

5.41964 

-4.87768 

4.47121 

■ -4 15184 

3 89235 

-3.67611 

-17/18 

2-3 

-9.97080 

1 8.72445 

-7.85201 

7.19768 

-6.68356 

6.26584 

-15/16 

3-4 

12.61050 

I -10.50875 

9.19516 

I -8.27564 

7.58600 

-7.04414 

-13/14 

4-5 

-12 20648 

9.15411 

-7.62843 

| 6.67488 

-6 00739 

5 50677 

-11/12 

5-6 

12.38595 

-6.19298 

4.64474 

1 -3 87062 

3.38679 

, -3.04811 

-9/10 

6-7 

3.46870 

4.31260 

-2.15630 

, 1.61723 

-1.34769 

1.17923 

00 

N 

1 

7-8 I 

! .74135 

.85390 

.97395 

- .48698 

.36524 

-.30437 

-5/6 

8-9 

10520 

.11255 

12005 

.12770 

j - 06385 

.04789 

-3/4 

9-10 

00720 

1 .00735 

.00750 

.00765 

.00780 

-.00390, 

1 -1/2 

10-11 

.00015 

.00015 

.00015 

.00015 

( .00015 

1 .00015 

1 

1 


* Figures immediately below the diagonal, obtained by cumulation from the bottom 
upward of the data in Column 2, are factorial moments about a — —J. Figures in the top 
line are factorial moments about a ■» 0. For use of factors in the last column see text. 


The several columns are thus completed, and by addition, in each column, of 
the item immediately below the diagonal, and of all the items above the diag¬ 
onal, the figures in the top line are obtained. These are the coefficients of equa¬ 
tion (10) for v. 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 


199 


(c) Decumulation . While it is not necessary to carry out the decumulation, since 
the entire computation can, if desired, be carried out in terms of y 9 s and m 9 a, 
there is a considerable interest in noting the values c a for integral values of a 
which result from the decumulations of the m 9 s . These, together with the 
original values for semi-values of a, are shown in Table 2 and Fig. 1. 

TABLE 2 

Values of c a = p(a)m(a) 

(1) for semi-values of a; original data . 


(2) for integral values of a; computed by cumulation of original data , shift of 
originy and decumulation. 


a 5-year units 

Ca 

a 5-year units 

Ca 

o 5-year units 

Ca 

0.0 

0 

4.0 

.21781 

8.0 

.10607 

0.5 

0 

4.5 

.31255 

8.5 

.05795 

1.0 

0* 

5.0 

.33400 

9.0 

.02268 

1.5 

0 

5.5 

.31025 

9.5 

.00615 

2.0 

0* 

6.0 

.27427 

10.0 

.00116 

2.5 

.00040 

6.5 

.23170 

10.5 

.00015 

3.0 

.02073* 

7.0 

; .18963 

11.0 

.00001 

3.5 

.09630 

7.5 | 

' .15090 


! 


* The value of c 2 came out negative, namely —.00570, and the value of c\ came out 
+ .00014. In the computation of 2ac a x* these two values were arbitrarily adjusted to 

ii 

zero, and c% was diminished from .02118 to .02073 to make the total 2 c« ■» 1.16635, sum- 

<-i 

ing only for integral values of a. 


4. The roots of equations (5) and (8). 

From the prior study already cited, the real positive and three pairs of complex 
roots for r of the characteristic equation 

(31) f x a p(a)m(a) da = f e~ ra p(a)m(a) da ~ 1 

•'a a 

were known. These were used to indicate the approximate location of the roots 
of (5) or (8), and more exact values were then obtained by Newton’s method of 
successive approximation. Table 3 shows the values of u, v, etc., corresponding 
to the new roots 


y =* x — 1 
= - 1 



200 


ALFRED J. LOTKA 


obtained through equations (8) or (10); and, for comparison the corresponding 
values obtained in the previous publication from equation (13). 10 The same 
table also exhibits the remaining roots and values of the constants Q , G , H. 


TABLE 3 


Constants of the series solution ( 6) of equation (3), corresponding to the five real and 
three pairs of complex roots of the characteristic equation (5) 

(United States , white females, 1920) 


Constants * 1 ^ 


Five Real Roots 


| Three Pairs of Complex Roots 


A. Computed on basis of recurrent series 


u 

.02714* 

—1.764f 

—3.812f 

-17.lt 

-94.31 

-.19800 

-.44720 

-.47587 

V 

0 . 

0 . 

0 . 

0 . 

0 . 

1.06498 

1.57000 

2.40490 

O 

5.64467 

7.73354 

-1255.04 

(2) | 

(2) 

5.28093 

10 45809 

7.73103 

H 

0 . 

0 . 

0 . 

0 

0 

3.03239 

-3.66726 

2.00874 

G/(G*+m 

.17716 

.12931 

-.00080 

(2) 

(2) 

.14241 

.08515 

.12117 

H/{G*+H*) 

0. 

0 . 

0 . 

0 

0 

.08177 

- .02986 

.03148 


B. Computed on basis of integral equation\ 



u | 

.02714 


! 



-.1930 

-.43655 

-.4902 

v ; 

0 . 





1.0724 

1.5771 

2.44245 

<* 

5.64514 




1 

5.15351 

10.22495 

7.40154 

H 

0 . 





2.98757 

-3.72741 

3.45312 

G/(G*+H*) 

.17715 





.14525 

.08620 

.11095 

H/(G*+H*) \ 

0 ! 


, 

1 : 

I 

.08420 

-.03135 

.05175 


(1) t in five year units 
Not computed 

•uo - log. j-o - -log. .97322 - .02714 
t Values of x 
t See [6, p. 8991 


To determine the remaining four roots, the product of the factors (y — yi) 
(y — V '*)••• (y — yi) was divided out of the polynomial of equation (10), re¬ 
jecting the remainder and leaving a fourth degree equation 

y 4 4- 120 y* + 2590 y* + 14617 y + 23118 = 0 

In the subsequent work it turned out that the roots of this were all real, and 
they were computed by obvious methods. Their values are also shown in 
Table 3. For the two numerically largest roots great accuracy was not at¬ 
tempted. They introduce terms with very rapid damping and presumably 
very small values of Q. u 


10 The divergence is due in part to details of computation. In the earlier publication 
the curve of fertility m(a) was smoothed by the method of translation, with a Gaussian 
distribution as basis. In the method here presented the raw data were used without 
smoothing, except such as is inherent in the process of the calculation described. 

11 At any rate, Qio + Qn must be small, since Qi + ... + Q 9 — 1.00313, and according to 
(24), with the convention that B(0 ) — 1 , the sum of all the Qi must be equal to unity. 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 


201 


As a check, in order to be assured that no serious error was introduced in neg¬ 
lecting the remainder after dividing out the factors (;y — y x ) up to (y — y 7 ), 
the product ITJii (y “ Vt) was computed and, after multiplying by a factor to 
make the absolute terms agree (.16635), was compared with the polynomial of 
(10). As a further indication, the coefficients of the product II were “decumu- 
lated” to obtain values of coefficients of the corresponding polynomial in x, to 

TABLE 4 

ii 

Coefficients of Powers of y in Equation (10) and in the Product JX(y — y%)\ 

i 

Also Coefficients of Powers of x in Equation (5) 



Coefficients of y a 

Coefficients of x a in Equation (5) 
Found by Decumulation 

a 

In Equation (10) 

In n(7/- y % ) 

Of Column (2) 

Of Column (3) 

a) 

(2) 

(3) 

(4) 

(5) 

0 

.16635 

.16635 ! 

- 1.00000 

-.99915 

1 

6.64127 

6.64072 | 

+ .00014* 

+ .00065 

2 

16.64550 

16.64782 

-.00057* 

- .00432 

3 

24.34106 

24.24197 

.02118* 

.02398 

4 

23.16840 

23.18070 

.21781 

.21774 

5 

15.05650 

15.07338 

.33400 

.33354 

6 

6.7250 

6.73812 

.27427 

.27474 

7 

1.99717 

2.00316 

, .18963 ! 

l .18882 

8 

.36404 

.36555 

! .10607 

.10641 

9 

.03483 

.03501 

, .02268 

.02276 

10 

.00127 

.00128 

• .00116 

.00117 

11 

.00001 i 

.00001 

i i 

.00001 

1 

.00001 


* In computing the denominator of Q according to (22) the values of the coef¬ 
ficients Ci and c 2 were arbitrarily made zero and the value of c 3 (age 15) was ad- 

ii 

justed to .02073 to retain the total 23 c, = 1.16635. 

compare with values of c a . The results are shown in Table 4. In view of the 
fact that the (numerically) highest roots were determined only in first approxi¬ 
mation, the agreement is satisfactory. 

It is to be noted that instead of applying the solution (6) to compute values of 
B(t), these latter can, of course, also be obtained directly, by carrying forward 
step by step the original recurrent series; or, alternatively, the births in suc¬ 
cessive generations can be computed step by step and the total births obtained 
by addition. The advantage of the solution (6) is that it enables one, if desired, 
to obtain B(t) for any value of t without having to compute B(t) for all inter- 



202 


ALFRED J. LOTKA 


vening values of t; also, the solution in an exponential series gives a better idea 
of the general nature of the process, as well as a direct indication of its asympto¬ 
tic course for large values of t, when the first term QqXq~ 1 with the positive real 
root Xq dominates all others. However this may be, it is interesting to compare 

TABLE 5 


Synopsis of Results of Computation of B(t) as 2 Qar 1 , Column (8), and a* 
2B m (t), Column (9), where B n (t) = Births per Unit of Time in nth 
Generation at Time t. (Time Unit = 5 years) 



A - 

Qx~* - x~* 

1 

O 


r 

cos vt 

— H sin vt ■ 


fE 

x* or u # 

.97322* 

-1.764* 

- 3.81208* 

-.19800# 

- 

.4472 0» 

-.47687# 

1 

ZA 

'"'"V v 





; 





0 

0 

0 

1.06498 

1.67000 

2.40490 


a) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

0 

17,716 

12,931 


28,482 

17,030 

24,234 

100,313 

1 

18,204 

-7,330 

21 

-415 


3,828 

-13,781 

527 

2 

18,704 

4,156 

-6 

-19,498 

- 

6,959 

3,329 

-274 

3 

19,219 

-2,356 

i 

-15,222 

- 

1,572 

2,256 

2,326 

4 

19,748 

1,336 


1,022 


2,844 

-3,362 

21,588 

5 

20,291 

-757 


11,057 


646 

2,223 

33,460 

6 

20,850 

429 


8,102 

- 

1,162 

-749 

27,470 

7 

21,423 

-243 


- 1,001 


-265 

-169 

19,745 

8 

22,013 

138 


-6,248 


475 

445 

16,823 

9 

22,619 

-78 


-4,294 


109 

-344 

18,012 

10 

23,241 

44 


792 


-194 

145 

24,028 

11 

23,880 

-25 


3,519 


-45 

-1 

27,328 

12 

24,538 

14 


2,265 


79 

-55 

26,841 

13 

25,213 

-8 


-568 


18 

51 

24,706 

14 

25,907 

5 


-1,976 


-32 

-26 

23,878 

15 

26,619 

-3 


-1,188 


-8 

4 

25,424 

16 

27,352 

1 


385 


13 

6 

27,757 

17 

28,105 

-1 


1,106 


3 

-7 

29,206 

18 

28,878 

1 


620 


-5 

4 

29,498 

19 

29,673 



-251 


-1 

-1 

29,420 

20 

30,489 



-617 


2 

-1 

29,874 

21 

31,328 



-321 


1 

1 

31,,008 

22 

32,191 



160 


-1 

-1 

32,349 

23 

33,076 



343 




33,419 

























APPLICATION OF RECURRENT SERIES IN RENEWAL THEORY 


*203 


TABLE 6 —Continued 



BnO) 

t 




Generations , n 



2Bn(t) 







(1) 

(2) 

(3) 

(4) 

(5) 

(6) 


(9) 

(10) 

(11) 

(12) 

(13) 

(14) 

(16) 

0 

100,000 







1 

0 







2 

0 







3 

2,072 

2,072 






4 

21,781 

21,781 






5 

33,398 

33,398 






6 

27,472 

27,429 

43 


1 



7 

19,866 

18,963 

£03 


! 



8 

16,735 

10,607 

6,128 





9 

17,954 

2,268 

15,685 

1 




10 

24,033 

116 

23,889 

28 




11 

27,361 

1 

27,022 

338 




12 

26,878 


24,905 

1,973 




13 

24,696 


18,481 

6,214 

1 



14 

23,851 


10,980 

12,858 

13 



15 

25,410 


5,345 

19,941 

124 



16 

27,759 


2,050 

25,030 

679 



17 

29,219 


526 

26,316 

2,377 



18 

29,506 


76 

23,527 

5,897 

6 


19 

29,414 


5 

18,092 

11,271 

46 


20 

29,862 



12,041 

17,579 

242 


21 

31,000 



6,906 

23,191 

903 


22 

32,348 



3,381 

26,442 

2,523 

2 

23 

33,423 



1,397 

26,426 

5,583 

17 


the result of the computation by means of the exponential series, carried out as 
set forth above, with the corresponding results of the computation of births in 
successive generations. This comparison is exhibited in Table 5. 

It will be seen that the agreement is good except for the second to fourth 
items, where perhaps the omission of the terms contributed by the numerically 
highest roots makes itself felt. 

5. Discussion. 

(a) The real roots of the characteristic equation (5). It can be shown [8] that only 
one of the real roots for z can be positive, and that the absolute value of any 
other root must be greater than the positive real root. 



204 


ALFRED J. LOTKA 


The negative real roots which make their appearance in the numerical 
example call for special comment. Practically, the “higher” negative roots are 
of little importance, at any rate in this example—first because the constants Q 
with which they are associated are relatively small; second because large absolute 
values of negative roots imply rapid damping, so that corresponding terms Qx~* 
very soon become negligible as t increases. Thirdly, the determination of these 
roots would be subject to a wide range of uncertainty, corresponding to the large 
percentage fluctuations or errors of determination of the values of the functions 
p(a)m(a ) = c a at the upper end of the reproductive period. 

But in theory these negative real roots suggest some pertinent questions. 
One wonders what would happen to them if the data were given, say, for single 
years of age, instead of 5-year groups. Instead of an equation of eleventh degree 
we would then have one of 55th degree. Furthermore, in those cases in which it 
may be permissible to pass to the limit, so that an integral equation takes the 
place of (2), negative roots for x would seem to be excluded as they would make 
the integral in (13) meaningless. 

A problem of perhaps little practical importance but of some theoretical in¬ 
terest may arise here, to which reference has also been made by P.H. Leslie in a 
recent article in Biometnka , 12 in connection with a different procedure. 

(b) Effect of finer subdivisions of histogram of p(a)m(a). The effect of this on 
equation (5) for x is not obvious at sight, since new coefficients would be in¬ 
serted between previous terms. The effect is more easily understood from a con¬ 
sideration of equation (8) for y. Here finer subdivisions would introduce new 
terms only beyond the last term originally present. The original terms would 
not be changed at all in form , and those involving only lower moments would 
be changed but little in numerical value, provided that the original histogram were 
not so coarse as to give inappropriate values even for these lower moments. 

The result, then, of finer subdivision of the histogram, would be to change the 
computed values of the lower roots only in minor degree. But the four negative 
real roots, depending in considerable measure on the higher terms of (5) or (8), 
would presumably be materially altered, and might perhaps give place to further 
complex roots. In any case they would be followed by new roots even more 
demote from practical significance than the original eleven. 

(c) The result as an interpolation formula. Strictly speaking, the solution (6) 
of (2) is applicable only for integral values of t. In particular, terms arising 
out of the negative real roots of (5) for x are obviously not adapted to furnish 
interpolated values of B{t) for fractional values of t, since fractional powers of 

11 See [9] and [10]. For a brief summary and analysis of Leslie's paper [9] see a review 
signed with the initials WGB in the Jourl Inst, of Actuaries Student’s Soc ., Vol. 4 (1946), 
Part II. The first application of the matrix method to these problems seems to be due to 
H. Bernardelli, “Population Waves,”, Jour, of Burma Research Soc., Vol. 31 (1941), Part I, 



APPLICATION OF RECURRENT SERIES IN RENEWAL THEORY 


205 


negative quantities in general are complex. Over the range of t where the first 
real root together with the three parts of complex roots adequately describe the 
process under discussion, these terms alone are, in this sense and to this extent, 
suitable for interpolation, disregarding the terms corresponding to the other nega¬ 
tive roots. 

Even less suitable for interpolation purposes, it would seem, would be terms 
arising from further negative roots that might be introduced by a finer sub¬ 
division of the histogram of original data. If we suppose this subdivision carried 
to great lengths, and if negative roots still appeared under these circumstances, 
they would give rise to rapidly oscillating positive and negative terms for even 
and odd integral values of t respectively (the time unit now being a subdivision 
of the original time unit) with no appropriate interpolation between these 
integral values. 

One further point calls for comment. In the process of idealization of the 
problem discussed, it has been assumed that p(a) and m{a) are independent of 
time, and the conclusions reached must be construed in the light of this assump¬ 
tion. In itself this would hardly call for comment, as it is a matter of common 
understanding. But the question does arise whether the assumption itself is 
free from implied internal contradictions. 

In a recent publication, P. K. Whelpton n has drawn attention to the fact that 
in times of rapid changes in the birth rate, the assumption of age specific fertilities 
being held constant at the values observed in a given calendar year may imply 
that some of the women had more than one first child, a logical impossibility. 

The data used in the present numerical example are derived from a period of 
relatively undisturbed birth rate (1920), and do not involve any such conflict. 
But, in the light of Whelpton’s contribution one may ask the broader question 
whether the computation of an intrinsic rate of natural increase and related 
parameters based on age specific fertility as observed in one calendar year 
retain any practical value at all. 

In answering this question, two considerations will be weighed. First, that 
ordinarily the rates computed in the usual way differ but little from those ob¬ 
tained by taking into account order of birth as in Whelpton’s procedure. Sec¬ 
ondly, that the computation using over-all values of m(a) for all orders of birth 
combined is a relatively simple matter based on data commonly available; 
whereas the more complete treatment of the problem taking into account order 
of birth is considerably more complicated and often not possible at all for lack 
of detailed data. 

REFERENCES 

[1] E. C. Rhodes, “Population mathematics,” Roy. Stal. Soc. Jour., (1940), pp. 229, 232. 

[2] S. D. WiCKfeELL, “Bidrag till den formella befolkningsteorion,” Staisokonomwk Tids- 

skrtft, Ileft 1-2 (1934), p. 1. 

“See [11]. Another refinement recently introduced into the measurement of repro- 
ductivity is to take into account duration of marriage. See Colin Clark and R. E. Dyne, 
Economic Record (Australia), June 1946, p. 23. 



206 


ALFRED J. LOTKA 


[3] E. J. Gumbel, “Eine Darstellung statistischer Eeihen durch Euler/’ Jahresber. der 

Deutachen Mathematiker Vereinigung, Vol. 25, p. 257. 

[4] W. Felleb, “On the integral equation of renewal theory,’’ Annals of Math. Slat., Vol. 

12 (1941), p. 243. 

[5J A. J. Lotka, “Contribution to the theory of self-renewing aggregates with special refer¬ 
ence to industrial replacement,” Annals of Math. Slat., Vol. 10 (1939), p. 11. 

[6] A. J. Lotka, “The progeny of a population element,” Am. Jour, of Hygiene, Vol. 8 

(1928), p. 892. 

[7] A. J. Lotka, “The progeny of an entire population,” Annals of Math. Stat., Vol. 13 

(1942), pp. 117-118. 

[8J W. Dobbernack and G. Tietz, Twelfth International Congress of Actuaries, Vol. 4 
(1940), p. 233. 

[9J P. H. Leslie, “On the use of matrices in certain population mathematics,” Biometrika, 
Vol. 33 (1945), pp. 202 and 206. 

[10] E. G. Lewis, “On the generation and growth of a population,” Sankhya, Vol. 6 (1942), 

Part I, p. 93 et seq. 

[11] P. K. Whelpton, “Reproduction rates adjusted for age, parity, fecundity, and mar¬ 

riage, Am. Stat. Assn. Jour., Vol. 41 (1946), p. 501. 



SOLUTION OF EQUATIONS BY INTERPOLATION 

W. M. Kincaid 
University of Michigan 


Introduction and summary. The present paper deals with the numerical 
solution of equations by the combined use of Newton’s method and inverse in¬ 
terpolation. In Part I the case of one equation in one unknown is discussed. 
The methods described here were developed by Aitken [1] and Neville [2], but 
do not seem as widely known as they should be, perhaps because the original 
papers are not readily available. (A short summary of Aitken’s work will be 
found in a recent paper by Womersley [3].) Mention should also be made of 
an interesting paper by Spoerl [4], which treats the same problem from a some¬ 
what different viewpoint. 

In Part II these methods are extended to sets of simultaneous equations. 


PART I. EQUATIONS IN ONE UNKNOWN 

1. Nature of the problem. We first consider the problem of locating, to any 
desired degree of accuracy, a real root x 0 of an equation of the form 

(1) y(x) = 0 

where y(x) is assumed to be analytic in an interval containing the root in ques¬ 
tion. Since we shall not be concerned here with the necessary preliminary work 
of separating the roots, etc., we may suppose that x 0 is known to lie within a 
given interval that contains no zeros of y'(x). (Multiple roots are thus ex¬ 
cluded; but of course any such root is a simple root of an equation obtained from 
(1) by differentiation, and the methods described below can be applied to this 
equation.) 


2. Aitken’s method of interpolation. The method to be described, which 
may be regarded as a generalization of Newton’s, depends on the use of inverse 
interpolation. It is therefore desirable to recall a few points from the theory of 
interpolation before proceeding further. 

Let / be a function such that f(t) is known for t = t \, h , • • • , t n . Then the 
Lagrange interpolating polynomial /i 2 ... n (0 is defined by 


/l2-..n(0 


f(n (t - m - — - tn) 

( t\ — tz) (tl — U) • • • (£1 —■ tn) 


( 2 ) 


. x/f \ (t ti){t *" 4)' 1 '(l tn) 

{t% — 2i) (t% — h) • • • (t>2 — tn) 


+ •••+/(*/») 


(t - h)(t - k)- 

{tn-t^tn-U). 


■•(*-<- 1 ) 
•( t n — k-l) 


207 



208 


W. M. KINCAID 


We note that 

fit (o - m + m) = 

^ ^ fu(t) t — ti 

/l23(0 = ^ ^ -— , * ' , fl'2i n(t) 

1 1 — h 


fltl) t - k 
ti — I* 


fm »- 1(0 t — ti 

fu n (Q * ~ *n 

*1 “ *» 


so that /m...n(0 can be evaluated for any given value to of t by a succession of 
linear interpolations. It is convenient to arrange the work in a table like the 
following (n = 4): 


TABLE la 


t 

1 /(/) / II III 1 

Parts 


1 m 1 i 

to~U 


| f 12(^0) 


ti 

1 f(U) I fui(to) 

to~h 


* faito) Jujn(to) 


u 

I f(k) )m(to) 

to U 


i MU) 1 


u 

I f(U) 

to U 


This form is well adapted for machine computation, for each denominator 
t % — tj — (to — tj) — (to — t x ) automatically appears in one set of counters when 
the corresponding numerator is obtained in the other. 

If f'(t) is known at one or more of the given points, this information is readily 
fitted into the scheme. For we see that 

( 4 ) fn(t) - lim /„(/) = f(h) + (t - WW 

< 2-0 

and all that is necessary is to repeat certain entries in Table la and to fill in col¬ 
umn I by using (4) as indicated in Table lb The extension to higher deriva¬ 
tives is obvious. 

TABLE lb 


t 

m 

I 

11 

in 

Parts 

h 

m 

fn(to) 



torU 


m 

fn(to) 

fm(to) 

fim(to) 

torh 

l 

tt 

m 

fn(to) | 

fm(to) 

\ 

fim(to) 

tort% 

t% 

m 

Mu) 

fm(to) 


to~ti 


m 

1 



tort% 



SOLUTION OF EQUATIONS 


2p9 


In applying the above to obtaining the root xq of (1), we must suppose that 
y(x) is tabulated or can be computed for a set of values of x in the neighborhood 
of Xo . What we do not know is the value of x corresponding to y = 0. It is 
therefore convenient to regard s as a function of y whose value is known at cer¬ 
tain points and then interpolate to get xo = a;(0). That is, we let y take the place 
of t and x that of f(t) in the preceding discussion, while 0 replaces to . The work 
is slightly simplified by the fact that the column of “parts” becomes identical 
with the left-hand column which contains the ^’s and can therefore be omitted. 

♦ 

3. Application to an example. The procedure will be most clearly indicated 
by an example. Consider the equation 

(5) y = x A + 2z 3 — 5x 2 — Sx + 1 = 0, 


which has a root between 0 and 1. (If the root were located elsewhere, it would 
be desirable to shift it to this interval in order to simplify the computation of y.) 

The work of evaluating this root to ten places is summarized in Table II, and 
explained below. In the first column, the numbers in parentheses are values 
dii 

of -f -, and the other numbers arc values of y, corresponding to the values of x 
ax 

in the second column. 


TABLE II 


V 

1.000 000 000 000 
(—8 000 000 000) 

1 X 

0.000 
i 0.000 

/ 

0.125 000 000 00 

1 

i 

/// 

0.152 100 000 000 

0.100 

0.117 938 436 13 

0.116 671 702 00 


-0.001 054 385 279 

0.117 

6 882 964 17 

884 075 87 


(-9 081 459 548) 

0.117 

3 896 94 

3 890 52 

0.116 883 877 01 

0.008 022 855 936 

0.116 I 

3 842 98 

90 67 

90 68 

(-9.073 020 416) 

0.116 

i 

4 254 15 

90 74 

90 68 


xo « 0.116 883 890 7 


The procedure is as follows. Taking x = 0 as a first approximation to x 0 , 
we find that y(0 ) = 1, y'(0 ) = — 8, and record this data in the y and x columns 
of the table. Note that for convenience, the value of y'(0) takes the place of a 
blank entry in Table lb. We now apply (4), which here takes the form 


(6) x n (0) = x 




v—1 


= 0 + 


-1 


- = 0 + 


0.125 


and enter the result in column I. Note that this is equivalent to one step of 
Newton’s method. 

In view of (6) we take x = 0.1 for our next approximation and apply (3) to 
obtain the second entry in column I and the first in column II. This last sugr 
gests x = 0.117 for our next trial value. (We do not compute y'(0.1), as little 



210 


W. M. KINCAID 


would be gained by doing so, and the time is better spent in going ahead as 
indicated.) Finding 2/(0.117) and filling in the table gives us the root to six 
places. 

4. Employment of tables. Continuing in the same line, it would seem nat¬ 
ural to take x = 0.116884 at the next step; and doing so would lead to the most 
rapid convergence. But another consideration enters. Up to this point the 
values of y were computed with the aid of the WPA Table of Powers, which is 
limited to three places in the argument. Bather than going to the extra labor 
of evaluating 2/(.116884), we proceed as indicated in the table, using y'{. 117), 
>y (.116) and y'(. 116), and stopping when the values of x in the last column 
agree to the desired number of places. 

This point has been dwelt on because it is likely to arise whenever tables are 
used in evaluating y(x). In the example just given, to be sure, we had a certain 
freedom of choice; but if y(x) is not algebraic, direct computation may be quite 
impractical. It may be noted that in such cases the method of inverse inter¬ 
polation is not only faster than the simple Newton's method but is capable of 
giving more accurate results. 

The error in the final result can be estimated from the standard formula for 
the error of interpolation, but this may be awkward because it requires the 
evaluation of higher derivatives of x with respect to y. In practice it is generally 
safe to rely on agreement of different interpolated values, and of course the result 
may be checked by substitution in the original equation. One simple point is 
worth noting, however—if the error in the original column of ar's is 0(e), that in 
the successive columns to the right is 0(e 2 ), 0(e 3 ), etc. 

5. Applicability of the method. Although the example we have presented is 
algebraic, the method is, of course, equally applicable to transcendental equa¬ 
tions. Moreover, it can be used, theoretically at least, to yield complex as well 
as real roots. The sole difficulty is that the numerical work becomes cumber¬ 
some in this case; how* serious it is depends on the type of computing machines 
used. If the equation is algebraic, Bernoulli's [5], [6] and Graeffe’s [7] methods 
are applicable. In fact, they are likely to be the most effective since they do not 
require prior knowledge of a first approximation to the root. If the alternative 
procedure of replacing the equation by two simultaneous equations for the real 
and imaginary parts of the root is decided upon, the methods described in the 
next section may prove useful. 

PART II. SETS OF SIMULTANEOUS EQUATIONS 

6. Two equations; general considerations. It is natural to take up next the 
problem of finding the simultaneous solutions of two equations in two unknowns. 
Let these equations be 

(7) u(x, y) = 0, v (x, y ) = 0, 

where u and v are analytic functions of x and y. 



SOLUTION OP EQUATIONS 


211 


If we had a general method of interpolation of functions of two independent 
variables, the problem could be solved in a fashion similar to that used in the 
preceding section. That is, u and v would be computed for values of x and y 
near the desired ones; then x and y would be regarded as functions of u and v 
and interpolations would be performed to obtain the values corresponding to 
u = v = 0.' 

It is easy to set up interpolating functions in a variety of ways, but the author 
has found none that are satisfactory for the problem in hand. Note that what is 
required is to determine the value of a function at any point in the plane, given 
its values at a set of fixed points. The most obvious idea is to use polynomials of 
the least possible degree for this purpose, as is done in the case of a single variable. 
In this case, however, the coefficients of a polynomial of the nth degree are de¬ 


termined by its values not at n + 1 but at 


(n + l)(n + 2) 


- I points; thus if a func¬ 


tion is given at 5 points, no unique quadratic interpolating polynomial can be 
constructed. What is worse, even if a function is given at 6 points, say, the 
quadratic polynomial determined will in general have large coefficients and take 
on unreasonable values if all the points happen to lie close to a common conic. 
Other schemes considered by the author have similar drawbacks, though the 
possibility of course remains of finding a suitable one by further research. 

The problem can also be handled, at least in principle, by eliminating one of the 
variables; but, apart from the difficulty of carrying this out in practice, the 
resulting single equation is likely to be more complicated in form than the original 
two. If so, solving it may require more computation than would be involved in 
attacking the original equations directly by the methods described below. So 
far is this true that even when a single equation is given in the first place it may 
be advantageous to replace it by a set of simpler equations. 


7. Newton’s Method. Although a direct extension of the method of inverse 
interpolation is not presently available, Newton’s method may be suitably 
generalized for this case. 

Starting with equations (7), we set up the auxiliary variables 
(8) X — uv y — vu ,,, Y = uv x — vu x , 


where the subscripts denote partial derivatives 


du 


du 

dy 


etc. 


We have 


dX. 

dx 


= U 9 Vy — V X Uy + UVxy 


VUxy , 


dY dY 

- =uv..-vu„, - -u,v. 


dX 

Sy 


= UVyy — VUyy , 


(9) 


Vy U a + UVgy — VUmy . 



212 


W. M. KINCAID 


For v — 0, v * 0, equations (8), (9) reduce to 

( 10 ) x-r-f-V -«, 

dy ax ox ay 

where J is the Jacobian of u and v with respect to x and y . 

Equations (10) will hold approximately for values of x and y near those satis¬ 
fying equations (7). That is, in the neighborhood of a solution X can be re¬ 
garded as a function of x alone and Y as a function of y alone. Then if x = x 0 , 
y = yo is the desired solution, (xi ,2/1) is a point in its neighborhood, and x\ = 
Xfa , y x ), Yi = Y(x 1,2/1), Ji = «/(zi, 2/1), we have 


£0 ~.Ti — —- , 

J 1 


?/o ~ ?/l + 


Also if (x 2 , ^2) is another point near (x 0 , y 0 ), 
(12) x 0 ~ ^— , //o ~ 

A2 — A 1 


, y l ¥ 1 Z ^ 

V2 — Tl 


Relations (11) and (12) can be used to obtain successive approximations to the 
solution. Use of these relations corresponds to employing Newton’s method 
and linear interpolation for the solution of one equation in one unknown. 

As a first example we consider the equations 

u ^ x 2 + xy + y 1 — 3 = 0 

(13) 

v ss x 2 y + y 2 — 1 = 0. 

We have 

u, = 2x + y Uy = x + 2y 

(14) 

v s — 2 xy v y = x + 2y. 


Drawing a rough graph indicates a solution near (1, 1). We evaluate u t v , etc., 
at this point as shown in Table III. Using (11) we get (2,0) for a second approxi¬ 
mation, and proceed as before. We can now use both (12) and (11) to get new 
approximations; they are (1.33, 0.57) and (1.25, 0.50), and are entered in the 
last two columns of the table. We therefore try (1.3, 0.5) next, and continue 
in this fashion until the desired accuracy is attained. Both (11) and (12) are 
used at each step and the values obtained are entered in the last two columns. 
The entries in the numbered rows are obtained by using (11), the others by using 
(12). The number of places to take in each succeeding step is judged from the 
agreement shown. 

Table IV indicates the process of finding a second solution of (11) by the same 
method. The convergence is very rapid in this case, mainly because the first 
guess is fairly close. 



TABLE III 


SOLUTION OP EQUATIONS 


ii3 



3.0 

4.0 

2.69 

3.05 

3.0201 

3.04159044 

3.0416879134 

* 

3.0 

2.0 

2.3 

2.3 

2.25 

2.2638 

2.26382752 

£ 

© © eo e* S3 S3 P 

CQ © w »-I 1-1 »-I -H* 

* 

1 

3.0 | 

4.0 1 

3.1 1 

3.4 

3.39 1 

3.4026 

3.40266551 

•» 

§ii 

_ os oi co 

© © S S © © © 

* r r r 

* 

os 

g 

_ sli 

© © -4* © © o © 

° ~ r ’ f r r 


1.0 

0.0 

0.5 

0.4 

0.37 

0.3750 

0.37499651 

M 

1.0 

2.0 

1.3 

1.5 

1.51 

1.5138 

1.5138345 


H « CO ^ »0 © N 



0 0*0 


_ ^ ^ CO 

. S3 5 ‘ 


o i6 c5 »S 53 S3 

'*—( 1—i 1—< r—» r—( r—I 


OS OS !-• 
_ t- “2 

Isiss 

CO CO CO CO 

t-H *—4 v“^ »—4 

O W U5 »0 




u, 


© O 

CO ^ 

I 


8 

a 


a 

r 


1 

»—< 

© 

f 


* 


o © 

CO <£> 

I 


S3 

CO 

1—t 

I 


s 



O 


I 




© 

\ 



i-i C4 CO IO CD f* 


So « 1.5138345184093 
y 0 • 0.3749965131978 




214 


W. M. KINCAID 



! -i.o 
| —2.11 
-2.059399 
2.0587404871 

a* 

-3.0 

-3.3 

-3.247 

-3.24579 


1 

1 2.0 | 
1.82 1 
1.781052 
1.7797811654 

f 

cs 2 

O N to © 

CO ci <N <N» 

till 

K> 

os 

S3 

si 

S3 8 

Eg 

o o o o 

7 1 

a 

co 

os § n 
© © © © 

d ’ ’ \ 

?s 

o w ^ w 

i-H 1—1 

1 1 i 1 


si 

O N tO (O 

7 r r r 


H N » ^ 


3 

-1.33 

-1.2721 

-1.27378 

-1.273524 

-1.27351068 

-1.27351061019 

-1 27351061064354 


-.667 

-.6984 

- .69872 

- .69879 

- .698770145 

- .698770075686 

- 69877007573026 

i 

* 

8 $ 

I 3 

i | 

i * S 

O CO o o 

? 1 


io »o 

g 1 

O i—« 

rP os 

£3 

2 ^ 

o o © o 

*f \ 1 f 

: 

9.0 

11.703 

11.28578997 

11.27579469 


1-1 C* CO 


-1.27361061064 





SOLUTION OF EQUATIONS 


215 


8. Inverse interpolation. In the preceding section, attention was drawn to 
the difficulty that may arise when tables, necessarily limited to a certain number 
of places in the argument, are used in the computation. In the example just 
discussed the values of u and v were easily computed directly to the number of 
places wanted. But a glance at the work will show that if we had been limited in 
computing u and v to values of x and y having, say, two decimal places, the solu¬ 
tions could have been carried to four places only. 

The device adopted in the preceding section was to use quadratic and cubic 
interpolates to secure greater accuracy, and it might occur to us to try the same 
idea here. But for such an interpolation to be strictly valid, equations (10) 
would have to hold identically. Since they hold only approximately, an error 
is introduced which, in general, is of the same order of magnitude as the error in 
linear interpolation. Thus continuing the interpolation would not improve the 
results. 

However, this very situation suggests a way out. For suppose we give x a 
constant value Xi , and compute X and Y for a number of values of y. For x = 
Xi , both X and Y can be regarded as functions of y alone; or we can regard X 
and y as functions of Y . Doing so, we can interpolate to any number of stages 
to find values of X and y corresponding to Y = 0; call these Xi, y x . Assigning 
x other constant values xs , X 3 , • • • , x m , we repeat the process, getting a set 
of values X 2 , • • ■ X m and y 2 , • • • , y m , all corresponding to Y = 0. Now 
along the curve Y = 0 we can regard x and y as functions of X ; performing one 
more interpolation, we obtain the desired values of x, y corresponding to X = 
Y = 0. The error in the final result can be estimated from the errors in the 
interpolations, and is of the same order of magnitude as the greatest of these. 

It will be noted that we did not refer to the definitions of X and Y in describing 
this procedure. Any pair of independent (analytic) functions X' and Y' having 
the property that X' = Y r = 0 when u = v — 0 could be used. However, it 

dX' d Y r 

is convenient to choose them so that — - and 

ay dx 

simplest course is to set 


are small. Probablv the 


X' = a x u + biv , Y' = aiu + 


where fli, , 61, b 2 are constants such that 


GL\ Vy fl2 y* 

61 u v ’ b 2 u x 

Let us apply this procedure to the example we have already worked (Table III). 
Suppose we wish to use values of x and y having not more than two decimal 
places. Within this restriction, we can still carry through the first few steps 
indicated in Table III to ascertain that xo ~ 1.514, y 0 ~ 0.375 where (xq , yd) 
is the desired solution. At the point (1.51, 0.37) we have 


X = 3.0201w - 2.25t>, F = 1.1174u - 3.39t>. 



216 


W. M. KINCAID 


§ 

.3750000007.3 

.3749981706.2 

.3749927660.3 

.3749839123.7 


oo OS 

§ Ij 
§ $ 

CO 

.3749982346.4 

81060.2 

ji 

is 

.3749839733.9 

38506.9 

? 

.3750223546.9 

49925925.9 

50220913.1 

.3750202494.7 

49908495.9 

50200136.8 

.3750145534.9 

49855307.3 

50143860.0 

.3750053935.8 

49767614.7 

50053322.3 


0.36 

0.37 

0.38 

0.39 

0.36 

0.37 

0.38 

0.39 

0.36 

0.37 

0.38 

0.39 

8 S3 8 8 
© © © © 


.1008 

.0337 

-.0338 

-.1017 

.101992 

.034089 

-.034124 

-.102917 

.103168 

.034456 

-.034656 

-.104168 

f f 

* 

-.1404 

-.1406 

-.1406 

-.1404 

-.038108 

-.038811 

-.039314 

-.039617 

.064768 

.063566 

.062544 

.061732 

.168228 

.166501 

.164974 

.163647 

m 

-.0604 

-.0306 

-.0006 

.0296 

-.049564 

-.019463 

.010838 

.041339 

-.038656 

-.098252 

.022352 

.053156 

-.027676 

.003033 

.033942 

.065051 

» 

f r r r 

-.0467 

-.0243 

-.0017 

-.0211 

-.0128 

.0097 

.0324 

.0553 

.0213 

.0439 

.0667 

.0897 

9 s 

H 

0.36 

0.37 

0.38 

0.39 

© © ® © 

0.36 

0.37 

0.38 

0.39 

0.36 

0.37 

0.38 

0.39 

8 

r-H 

•o 

r—< 

1.52 

1.53 







0337 -.1406 -.1407004470.9 

0338 -.1406 6000000.0 -.1406252237.1 

1017 -.1404 6995581.7 47792.5* -.1406250024.7 


OF EQUATIONS 


217 



.1406250024.7 1.50 I 

.03908741039.0 1.51 1.5138495506.5 i 

.06302572977.3 1.52 278531.3 1.5138345680.7 j 

.1657149545.4 1.53 624787.5 44615.8 1.5138345191.9 










218 


W. M. KINCAID 


Noting the ratios of the coefficients of u and v, we select 
X' = 4 u- Sv, Y' - u — 3v. 

Next we evaluate X' and Y' for the 1G points having ^-coordinates 1.50, 
1.51, 1.52, 1.53 and ^-coordinates 0.36, 0.37, 0.38, 0.39, as shown in Table V. 
Starting with the four points for which x = 1.50, we interpolate to find the values 
of y and X’ corresponding to Y f = 0; they are y\ — .3750000007, X[ = -.1406- 
250025. We proceed in the same way with the points corresponding to the other 
values of x; the results, as shown, are y 2 = .3749981706, X£ = —.03908741039; 
y 3 = .3749927660, X' z = .06302572977; y K = .3749839124, X[ = .1657149545. 
(The extra digits given in'Table V are to take care of rounding-off.) Finally, 
using these values, we interpolate to find the values of x and y corresponding to 
X' = 0, and get 


a; = 1.5138345192, y = .3749965140. 

Comparing these results with those obtained earlier, we see that they are in error 
by about 1 unit in the ninth place; a distinct improvement over the four correct 
places that could have been secured without using this device. Note that if we 
had not had our earlier results for comparison, a check could have been obtained 
by carrying through the interpolation in the reverse order; i.e., starting with 
fixed values of y and finding values of x and Y’ corresponding to X' = 0. 

As in the case of one equation in one unknown, derivatives could be brought 
into the interpolation scheme, permitting greater accuracy with fewer points. 

dx dx d 2 x 

But the derivatives needed would be , ^y ,, ,etc., and the general 

setup would be rather awkward, so that extra labor would probably be required. 


9. Three or more equations. The methods discussed in this section are 
readily extended to the solution of three or more simultaneous equations in an 
equal number of unknowns. For example, if we are given three equations of the 
form 

w(x, y } z) = 0, v(x, y, z ) = 0, w(x, y, z) = 0, 
we define new variables 


U V w 


U x v x w x 

U x Vx U'x 

j 

X = ' UyVyWy 

Y = 

U V IV | , £ 

Uy Vy U'y 

1 U t v s w, 

1 

u, v t Wg 

1 U V w 


which are analogous to the X and Y of (8); from this point on the work is practi¬ 
cally the same as before. 


REFERENCES 

(1) A. C. Aitken, “On interpolation by iteration of proportional parts, without the use 
of differences, Edinb. Math. Soc. Proc. t ser. 2, Vol. 3 (1932), pp. 56-76. 



SOLUTION OF EQUATIONS 


1219 


[2] E. H. Neville, “Iterative interpolation,” Indian Math. Soc. Jour., Vol. 20 (1934), pp. 
87-120. 

13) J. R. Womebsley* “Scientific computing in Great Britain,” Mathematical Tables and 
aids to Computation , Vol. 2 (1946), pp. 110-117. 

[4] C. A. Spoerl, “Solving equations in the machine age,” Amer. Inst. Actuar. Record, 
Vol. 31, Part I (1942), pp. 129-149. 

[6] T. C. Fry, “Some numerical methods for locating roots of polynomials,” Quart. Appl. 
Math., Vol. 3 (1945), pp. 89-105. 

[6] W. M. Kincaid, “Numerical methods for finding roots and vectors of matrices,” Quart. 
Appl. Math., Vol. 5 (1947), pp. 320-345. 

[71 E Bodewjg, “On Graeffe’s method for solving algebraic equations,” Quart. Appl. 
Math. Vol. 4 (1946), pp. 177-190. 



ESTIMATION OF A PARAMETER WHEN THE NUMBER OF 
UNKNOWN PARAMETERS INCREASES INDEFINITELY WITH 
THE NUMBER OF OBSERVATIONS 

By Abraham Wald 
Columbia University 

Summary, Necessary and sufficient conditions are given for the existence 
of a uniformly consistent estimate of an unknown parameter 0 when the succes¬ 
sive observations are not necessarily independent and the number of unknown 
parameters involved in the joint distribution of the observations increases in¬ 
definitely with the number of observations. In analogy with R. A. Fisher’s 
information function, the amount of information contained in the first n observa¬ 
tions regarding 0 is defined. A sufficient condition for the non-existence of a 
uniformly consistent estimate of 0 is given in section 3 in terms of the information 
function. Section 4 gives a simplified expression for the amount of information 
when the successive observations are independent. 

2. Introduction. J. Neyman has recently treated the following estimation 
problem 1 : Let X x , X 2 , • • • , etc. be a sequence of independent chance variables 
the distribution of each of which depends on some unknown parameters. Two 
kinds of parameters are distinguished, structural and incidental parameters. A 
parameter 0 is called structural if there exists an infinite subsequence of the 
sequence {A,} such that the distribution of each of the chance variables in the 
subsequence depends on 0. Any parameter which is not structural is called 
incidental. Neyman has considered the case when there are a finite number of 
structural parameters, say 0i , • • • , 0„ and an infinite sequence {£»}> (* = L 2, 
• • • , ad inf.), of incidental parameters. He has studied the problem of consistent 
and efficient estimation of the structural parameters and has obtained several 
interesting results. He has shown, among others, that the maximum likelihood 
estimate of a structural parameter 0 need not be consistent, even when consistent 
estimates of 0 exist. Neyman has also given a method for obtaining consistent 
estimates of the structural parameters. This method, however, is applicable 
only under certain restrictive conditions. 

In this paper we shall consider a more general case than that treated by Ney¬ 
man, but we shall concentrate on one aspect of the problem, namely that of the 
existence of consistent estimates. 

Let [X{\, (i = 1, 2, • • • , ad inf.), be a sequence of chance variables, not 
necessarily independent of each other. It is assumed that for each n the chance 
variables X\ , • • • , X n admit a joint probability density function 
p n (x i, • • •, x n 10, {i, • • • , f n ) where 0, £i ,&»'"» etc. are unknown parameters.* 

1 Address given by J. Neyman at the meeting of the Institute of Mathematical Statistics 
in Atlantic City, January, 1947. 

* While 0 is assumed to be a real variable, we admit £< to be a finite dimensional veotor, 
i.e., £< ■■ ($<i,.. ., $<n) where may be any finite positive integer. 

220 



PARAMETER ESTIMATION 


221 


We shall require that the consistency relations among the density functions 
pi , pt , • • • ,etc. be fulfilled, i.e., 


( 1 . 1 ) 


-+00 

I Pn-H — p n , (w = 1,2,*", ad inf.). 

eo 


It should be remarked that it is not postulated that p n actually depends on all the 
parameters that appear as arguments in p n . It is merely assumed that p n 
does not depend on any parameter that does not appear as an argument in p n , 
i.e., pn does not depend on £ t - for any i > n. It follows, however, from (1.1) that 
if p n depends on a parameter £, then also p m depends on £ for any m > n. 

Neyman’s definition of structural and incidental parameters can be extended 
to the case of dependent observations considered here by saying that the dis¬ 
tribution of Xi does not depend on a parameter £ if and only if the conditional 
distribution of Xi for any given values of X \, • • • , X t _i does not depend on £. 
It is not postulated that each of the parameters £i ,£*,••• , etc. is incidental; 
some of them may be structural. We shall not make an explicit distinction 
between structural and incidental parameters, since for the purposes of the 
present paper this does not seem to be necessary. 

In this paper we shall deal with the problem of formulating conditions under 
which a uniformly consistent estimate of 0 exists. A statistic t n {x x , • • • , x n ) is 
said to be a uniformly consistent estimate of 0 if for any positive 8 

(1.2) lim prob. (|*» — 0| < 5) =1 


uniformly in 0 and the £’s. 

In section 2 a necessary and sufficient condition is given for the existence of a 
uniformly consistent estimate of 0. In section 3 the amount of information 
supplied by the first n observations concerning 0 is defined. It is then shown 
that if the amount of information is a bounded function of n over a non-degener¬ 
ate 0-interval, no uniformly consistent estimate of 0 exists. Section 4 gives a 
simplified formula for the amount of information in the case when the X's are 
independently distributed. 

2. A necessary and sufficient condition for the existence of a uniformly 
consistent estimate of 0. In deriving a necessary and sufficient condition for 
the existence of a uniformly consistent estimate of 0, use will be made of some 
results contained in a publication of the author [1] dealing with statistical decision 
functions which minimize the maximum risk. In [1] it is assumed that the 
domain of each of the unknown parameters is a closed and bounded set and that 
p n is continuous jointly in all of its arguments. Thus, in order to be able to use 
the results obtained in [1], we shall have to make the same assumptions here. 
In what follows we shall, therefore, assume that each of the parameters 0, £ 1 , 
£ 1 , • • • , etc. is restricted to a finite closed interval and that p n is a continuous 
function of xi , • • • , x n , 0, £i, • • • , £ n . 

Let [a, 6] (a < 6) be the 0-interval to which the values of 0 are restricted. 
Clearly, if t n (x 1 , • ■ • , x n ), (n = 1 , 2, • • • , ad inf.), is a uniformly consistent 



222 


ABRAHAM WALD 


estimate of 0, then also t* is a uniformly consistent estimate of 0 when fi = in 
when a £ t n £ b, C = a when t n < a and t* = b when t n > b. Thus, without 
loss of generality, we can restrict ourselves to estimates t n which can take values 
only in the interval [a, 6]. Uniform consistency of t n is then equivalent with the 
condition 

(2.1) lim E[(t n - 0) 2 |Mi, ,i»l = 0 

tlmmOO 

uniformly in 0 and the £’s. For any chance variable u the symbol 
E(u | 0, £i, { 2 , • • •) denotes the expected value of u when 0, & , £ 2 , • • • are the 
true parameter values. 

In [1] a non-negative function W(t n , 0), called weight function, is introduced 
which expresses the loss suffered when t n is the value of the estimate and 0 is the 
true value of the parameter. The risk is defined in [1] as the expected value of 
the loss, i.e., the risk is given by 

(2.2) r n (0, &,•••,£»)- E[W(t n , 0) | 0, & , • • • , £J. 

If we put W(tn , 0) = (t n — 0) 2 , we have 

(2.3) rjfl, *!,•••,?„) = E[(t„ - ef I e, ft , • • • , *„]. 

It can easily be verified that Assumptions 1-^4 in section 3 of [1] are fulfilled 
for the weight function W(t n , 0) — (t n — 0) 2 . 3 Thus, all results obtained in 
[1] can be applied to the risk function given in (2.3). According to Theorem 4.1 
in [1] the risk function given in (2.3) is a continuous function of 0, &, • • • , £„ 
for any arbitrary estimate tn . We shall denote the maximum of (2.3) with re¬ 
spect to 0, { 1 , • • • , £ n by r n [t n ]. Thus r n [t n ] is a functional which associates a 
non-negative value with any estimate function t n . 

It follows from (2.1) that t n is a uniformly consistent estimate of 0 if and only 
if 

(2.4) lim r n [t n \ = 0. 

n —00 

For any 0 and for any n let F„(£ 1 , • • • , {„ | 0) be a cumulative distribution 
function of £ 1 , • • • , £ n • Uet, furthermore, 

q n (xx , • • • , .r J 0, F n ) 

(2 5) r “I -0 ® /• 4*^ 

= / ••• I PnU\,--',x n \e, &•••,{.) 

J-tO 00 

We do not require that Fi, F t , • • • , etc. satisfy the consistency relations, i.e., 
lim F n+ t(£i, • • • , f» +1 1 B) is not necessarily equal to F„({i, • • • , | 6). 

* In verifying Assumption 4, we may assume that p n is always > 0, since for any given 
values 0, £i, . . ., ( n we may restrict the domain of (x t , . . ., x n ) to the subset of the sample 
Bpace where > 0. 



PARAMETER ESTIMATION 


* 223 


Hence, also the distributions q n do not necessarily satisfy the consistency rela¬ 
tions. Clearly 

(2.6) r n [t n ] £ £ ■ • • J[ {t n - 0) 2 q n {x x , • • • , x n 10, F n ) dx x , • • • , dx n 

for any 0 and imy F n . Hence, (2.4) and (2.0) imply that if t n is a uniformly 
consistent estimate of 0, then t n remains a uniformly consistent estimate of 0 
also when q n is the distribution of X x , • • • , X n for any arbitrary choice of F n . 

For each n let C„(0, fi, • • • , f n ) be a joint cumulative distribution function of 
0» £i >■••>£» • If this is regarded as an a priori distribution of 0, f 4 , 

and if our aim is to choose t n so that 

E(t n - ef 

(2.7) /•*h° /*-t« 

= / • • • / (<» - , • • • , Xn 10, &, • • • , {„) dC„ d£i • • ■ dx n 

is a minimum, then the best choice of t n is to put it equal to the a posteriori mean 
value of 0. Let t n (xi , • • • , x n ; C„) denote the a posteriori mean value of 0 
when C n is the a priori distribution, i.e., 

f 0Pn(£ 1 , ' > •£« I 0j fl > * * * , fn) dC „ 

(2.8) t*(x 1 , • • • , Xn ; Cn) = 

J Pn(x 1 , * * ■ , .l’n | fl ' ’ , fn) dC», 

where the integration is to be taken over the whole domain of the parameters 
fi, • • • , f „ . Let, furthermore, f n [C„] denote the value of (2.7) when tn = 
t n (xi , • • • , x„ ; C n ). According to Theorem 4.4 in [1] there exists a particular 
distribution C°„ , called a least favorable distribution, such that 

(2.9) f n [C n ] ^ f n [C° n \ 
for all C n . Let 

(2.10) tix ,, • • • , x n ) = t*(x, ,•••*,„; C° n ). 

It follows from Theorems (4.5) and (5.1) in [1] that for any estimate tn we have 

(2.11) r,(<„] £ r„[£] = f„(C°„). 

Hence, a necessary and sufficient condition for the existence of a uniformly 
consistent estimate of 0 is that 

(2.12) lim f*[Cij = 0. 

Jlmm 00 

Let F n (£i, • • • , f« | 0) denote the conditional cumulative distribution of 
ft, • • • , fn for given 0 that results from the joint distribution 0^(0, f*, • • • f n ) 
and let Fi(fi, • • • , f n | 0) correspond to Ci(0, fi, • • • , fn). Clearly, any uni¬ 
formly consistent estimate of 0 with respect to p n (xi , • • • , x n | 0, fi, • • • , f n ) 



224 


ABRAHAM WALD 


is a uniformly consistent estimate also with respect to q n (xj | 0, F m ) 

for any F n . On the other hand, if q n (xi , • • • , x n | 0, F») admits a uniformly 
consistent estimate of 0, equation (2.12) must hold and, therefore, p H (x i, • • • , 
x n | 0, {i, • • • , f n ) admits a uniformly consistent estimate of 0. Hence we 
arrive at the following theorem: 

Theorem 2.1. A necessary and sufficient condition that 

Pn(Xl | 0, (l , • • • , {„) 

admit a uniformly consistent estimate of 6 is that q n (xi, • • • , x n | 0, F n ) admit a 
uniformly consistent estimate of 0 for any arbitrary choice of F„ . 

3. Amount of information contained in the first n observations concerning 
the parameter 0. We shall make the following assumptions: 

Assumption 1 . The first two derivatives of p n (x i, • • • , x n \ 0, fi, • • • , £ n ) 
with respect to 0 exist. 

Assumption 2. We have 


(3.1) 

r -H» /-+» 

/ ••• / Max 

•1— QQ If- QO 0 

Spn 

dd 

and 



(3.2) 

r 00 /■ 00 

/ ••• / Max 

►— 00 J - 00 0 

d'pn 

aa 2 ' 

for any n. 




Assumption 3. The integral 


i: i: 


d log q n ( 2 * 1 , • • • , x n | 0, F n ) 


qn(.Xi , ■ • • , x n j 0, F„) dx 1 • * • dx n 


exists for any 0, F n and n where q n is defined by (2.5). 
Since 


9 2 log q n __ 1 d 2 q n _ fd log q„Y 
002 q n 002 \ 00 / 


and since, because of Assumptions 1 and 2, 




'c fgn 
, 002 ‘ 




dXn — 0, 


we have 


£•••£ 


’ log 
, dP 


dXi • • • dx n 


C-£(*&)'*+■■■*■■ 



PARAMETER ESTIMATION 


225 


Let 


(3.4) cM - 84b. {- £•- £(%!-•)„• • ■ fc}. 


Clearly c n (B) ^ 0. We shall now show that 
(3.5) Cn+10) ^ c n (6) 

In fact, we can write 


forn = 1,2, • • • , ad inf. 


(3.6) 


-a 2 log g .+i(gi, • • • , X„ + 1 1 8, F n+ i) _a ! log q„(xi, • • • , x„ 10, F*) 

aa 2 . aa 2 

_ 32 iog/n +lpEn-H I Xl , • • • , Xn , 0, F n+1 ) 

aa 2 


where Ft = lim f„ + i(£i, • • • , f„+i1 a) and /„ +1 (x» + i | x x , • • • , x., 0, F„ +1 ) 

is the conditional probability density function of X„ +1 given the values of X \, 
• • • t Xn and assuming that the joint density function of X \, • • • , X n +i is given 
by g»+i(zi, • • • , x n +i | 0, F n +i). Since c n (0) ^ expected value of 

_ a 2 log ? n (xi ,•••_, x n \ e, Ft) 

"aa 2 

and since the expected value of — is ^ 0, inequality (3.5) must hold. 

ud i 

In analogy with R. A. Fisher’s information function, we shall call c n (B) the 
amount of information contained in the first n observations regarding B. We 
shall now prove the following theorem: 

Theorem 3.1. //limc n (0) ^ c < oo over a finite non-degenerate B-interval /, 

n—oo 

then there is no uniformly consistent estimate of 6. 

Proof. If for any n, c n (B) ^ c < °o over the interval /, for each n there exists 
a distribution F n (fi , • * • , | 0) such that 

0 ^ _ r ... f- a* log FJ 

(3.7) •'-« 30 2 

• Qn{x 1 , • • • , x„ 10, F») dxi • • • dx n ^ c + 1 


for all n and for all 0 in /. Let t n be any estimate and let 

b»(6) - E(t„ — 0) = f ••• f (t„ — 0)ry„(xi, • • • , x, 10, F„) dxj • • • dx. 

(3.8) “ “ 

r" r" 

* = / • • • / , • • • , x. | a, F n ) dxi • • ■ dx n - a. 

•*-00 •'-00 

Since tn is bounded, it follows from Assumptions 1 and 2 that exists and is 

du 




226 


ABRAHAM WALD 


a continuous functionjof 0. According to a theorem by Cramer [2] we have 

(i + Y 

(3.9) E(t n - Of = [ • • • [ « - 0)*q n dxi ^— -JP- 

»“ QQ oc C I J. 

for all 0 in I. Thus,Jin order that lim E(t n — 0) 2 = 0 uniformly in 0, we must 

naaOO 

have 

(3.10) lim = - 1 

ft—00 CW 


uniformly in 0 over /. Let I be the interval ranging from g to h (g < h). From 

(3.10) it follows that 

(3.11) lim [b n (h) - b n (g)] = g - h. 

Hence 

lim inf max [& n (0)l 2 ^ -- -- . 

n™oo 8 in 7 4 

Since E(t* — fff ^ [6„(0)] 2 , E(t„ — 0) 1 cannot converge to zero uniformly in 8 
and Theorem 3.1 is proved. 

4. Formula for c n (8) when p„(xi, ■ ■ • , x n | 6, &, • • • , ( n ) is equal to 
I 9, £i) <Pt(xt | 9, fa) • • • <Pn(xn | 0, £„). Let gi{xi | x x , • • • , *<_i, 0, F n ) be the 
conditional probability density of X ,• given Xi , • • • , Xi- i when the joint density 
function of Xi, ■ ■ ■ , x n is given by q n (x,, ■ ■ • ,x n \0, F n ), (i £ n). Clearly, 



Now 


(4.2) Qi(xi | xi, ■ ■ • , Xi_i, 6, F n ) = f <f>i(xi | 0, fi) <///,({< \ xi, ,Xi-i,6,FJ 

J-{£ 


where //<({»• | xi, • • • , £,_i, 0, F w ) denotes the conditional cumulative distribu¬ 
tion of given x \, • • • , i, assuming that F n (£i | 0) is the joint cumu¬ 

lative distribution of {i, • • • , £„ and p n (x x , • • • , x n | 0, fc , • • • , f n ) is the joint 
density of X \, • • • , X n for any given values of 0, fi, • • • , £ n . 

It follows from (4.2) that 



d 2 log gi 


g% r ^ C n <(0) 



d* log £ <Pi(xi 10 , {,) dC<(£j) 
— 




[“ <Pi dC{ 

00 ^ 



= g.l.b. 
c<(£() 



PARAMETER ESTIMATION 


227 


where may be any cumulative distribution of {,•. Hence 

and, therefore, 

(4.4) c n {6) = 2 Cn i(6). 

i-l 

The quantity c n ,(0) is simply the amount of information contained in the ith 
observation alone. Thus, formula (4,4) says that if Xi, • • ■ , X n are inde¬ 
pendent, the total information contained in the first n observations is equal to 
the sum of the amounts of information contained in each of these observations 
singly. 


REFERENCES 

[11 A. Wald, “Statistical decision functions which minimize the maximum risk”, Annals 
of Math., Vol. 46 (1946). 

[2] H. Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, 1946. 



INVERSION FORMULAE FOR THE DISTRIBUTION OF RATIOS 

By John Gtjrland 
University of California , Berkeley 

1. Summary, The use of the repeated Cauchy principal value affords greater 
facility in the application of inversion formulae involving characteristic func¬ 
tions. Formula (2) below is especially useful in obtaining the inversion formula 
(1) for the distribution of the ratio of linear combinations of random variables 
which may be correlated. Formulae (1), (10), (12) generalize the special cases 
considered by Cramer [2], Curtiss [4], Geary [6], and are free of some restrictions 
they impose. The results are further generalized in section 6, where inversion 
formulae are given for the joint distribution of several ratios. In section 7, the 
joint distribution of several ratios of quadratic forms in random variables 
X \, Xi , • • • , X n having a multivariate normal distribution is considered. 


2. Introduction. We shall write 

^ ^ ^ » * * » f n) dtl dtq • • • dt n 

= lim JJ ■■■ j g(h ,t,, •••,<,) dti dt t 

r t -*oo «<<|<< | <Ti 

which might be called the repeated Cauchy principal value of 

» 00 -so 

/ • • • / g(tl , t 2 , • • • , tn) dtl • • * dt n , 

1 GO 00 00 


dtn t 


and which we shall use frequently. The results of this article may be regarded 
as extensions of the following theorem proved in section 4. 

Theorem 1. Let X\ , , • • • , X n have the joint distribution function 

F(xi , x% , • • • , x n ) with corresponding characteristic function <f>(ti ,&,•••,$»)• 
LetG(x) be the distribution function of {aiXi + • • • + a n X n )/(biXi + •••-}- bnX n ) t 
where a\ , a %, • • • , o„ , b \, 6a , * * * ,b n are real numbers. If 


p{Z bjXj < o| = 0, 


(1) G(x) + G(x - 0) = 1 - — (f -I— _ fo g >’ • • • , <(<*■. _ 

ri J t 

3. An inversion formula for distribution functions. Let F(x) be a distribu¬ 
tion function and <f>(t) be the corresponding characteristic function. Then the 
following inversion formula holds: 


228 



INVERSION FORMULAS 


2&9 


(2) F(t) + F(( - 0) = 1 - 1 / 

JTlJ t 

Proof. 

(C + 0 ? - GC + 0 £'<“ £«" “"w 

- Ly^ic + 0‘ M ~°r 

by the Fubini theorem on the inversion of integrals. But 

s/^t- -<-e. 


/ r • 
r £ 

uniformly bounded in T, the principle of bounded convergence for Lebesgue 
integrals implies that 


s S £ "<*) (£' + S'.) ^ ?-/>(*- « "W 

-(£ + l ll , + L)‘ B ' u ~ edF(x) 


= -F({ - 0) + 1 - Fft). 


The required result follows at once. 

Another form of (2) may be obtained as follows: Let H(x ), K(x) be distribu¬ 
tion functions, and ^(0, x(0 the corresponding characteristic functions. Setting 
F — H><t> = = 0 in (2) yields 

m o) + //(o - o) -1 -1 <f m y, 

VI J t 

while setting F = K, <£ = x> in (2) yields 

KXf) + - 0 ) = 1 - i <f 

VI J t 


Clearly 

(3) K( f) + K(S - 0) = #(0) + //(0 - 0) + - <f ~ r* dt. 

VI J t 

If H — K, then ^ = x» and (3) reduces to a well-known inversion formula (cf. 
Kendall [7, p. 91]). • 


4. Distribution of the ratio (aiXi + • • • + a n X n )/(biX i + • • • + 6 n X n ) 
with denominator positive. Theorem 1. Let X i9 X*, • • • , X n have the joint 



230 


JOHN GURLAND 


distribution function F(x 1 , x 2 , • • • , x n ) with corresponding characteristic function 
4>(ti , k , * • • , tn). Let G(x) be the distribution function of (a{Xi + • • • + OnX n )/ 
(i biXi + • • • 4- bnX n ) where ai , a %, • • • , On , &i, b 2 , • • • y b n are real numbers . // 

PCS? <0} = 0,then 

G(x) + G(x - 0) = 1 - (f '' ’ <(a - n di. 

7rt 7 f 

Proof. Note that 


and let #*({) = P{S(a< — ba)Xi < £} and **(0 be the corresponding char¬ 
acteristic function. Clearly R x ( 0) = G(x) and 

Xx(t) = 4>{t(a>i b\x), * * * , t(ctn b n x)\. 


On applying (2) to /£*(£) and setting £ = 0, the required result follows at once. 
If (3) is applied in place of (2), with K = G, we obtain 

G(x) + G(x - 0) - H( 0) + 7/(0 - 0) 

^ _|_ _1_ I t(t) — <t>{t(ai — bix), • • • , t(a n — fr w s) q 

t ri J l 

We shall consider (3) and (4) when n == 2 and 


/ai /l 0\ 

U V ~ \0 1/ 


Two cases will be treated separately; first, when X \, X 2 are independent, second, 
when Xi , X 2 may be correlated. 

If Xi , X 2 are independent, and F(:ci, 00 ) = Fi(xi) f , za) = F 2 {x 2 ) y with 
corresponding characteristic functions 4>i(t) y <fo(t) then (1) becomes 

(5) G(r) + GU - 0) = 1 - -L (f dt 

Tn J i 

while (4) becomes, taking H — F 


G(x) + G(x 


-0) =±(f 

m J 


4*2 (0 — 4>i(t)4>z(--tx) , 

V — dL 


Cramer [2, p. 46] proves, for Xi , X 2 independent and 7^(0) = 0, that 

1 f* 4>*(t) — 4>i(t)4>i( — tx) J 




under the following conditions: 

(i) Xi and X 2 have finite means; 

« r mi) 


a,) / «« 

Jl t 


, dt < 00 . 



INVERSION FORMULAS 


231 


If X \, X 2 may be correlated, then (1) becomes 

(7) G(x) + G(x - 0) = 1 - 1 (f dl ; 

m J t 

while (4) becomes, taking H = F, 

(8) G(x) + G(x - 0) = <f dt. 

TTl J t 


Professor P. L. Hsu, in a course of lectures attended by the author at the 
Statistical Laboratory, University of California, gave the following result of 
Cramer, which was stated thus, using the above notation: 


(9) 


G(x) = - 1 . [" ^ (<) ... x * (t) dt, 

2in J-oo t 


provided]^ 


!<fc(0 - x*(t) 


dt < oo, where F 2 ( 0) = 0 


and Xx(t) is defined above expression (4). 

The following corollary is obtained from (1) according to well-known theorems 
concerning differentiation under the integral sign: 

Corollary. Suppose <j>(ti , , * * • , k) is the characteristic function corre¬ 

sponding to Xi , X 2 , • • • , X n , and G(x) is the distribution function of 


(fliXi + • • • + a n X n )/(biXi + • • • + b n X n ); 
then t if P{ 26^ <0} =0, 

■ M l dt, 

otk J<*—<(a*—6*x) 

in every interval in which the integral converges uniformly . 

If n = 2, and 


a® 


A, „A _/l ON 
\o 2 bj \0 1/ . 


then 

(U) G ' (X) = 2 Lf ' 

Cramer [3, p. 317, exercise 6] states the following result: 


If . F(xi, xt) = £ £ f(u , v) du dv, and Fj(0) = 0, then 


if the integral is uniformly convergent with respect to x. 



232 


JOHN GURLAND 


Geary [6] has shown that if F(x i, x%) 

X(f, v) = f e it9 f(u t v) du , then 
•^-00 


n /(w, v) du dv y F*(0) = 0, and 




d<t>(h , Q 
dt 2 


—ii* 




provided 

(i) <t>(h k) = 0 for U = db oo, 

(ii) l dy J[ y\(t,y)e~' lyx dt = j dt £ y\(t, y)e~' tyx dy. 

Formula (1) can be employed in the case n = 2, Xi, Xi are independent, and 


fli h\ /l 0\ 

^2 b 2 ) \0 1/ 


to obtain closed expressions for the distribution functions of ratios in which the 
variable in the numerator and that in the denominator may have any one of 
the following four distributions: Binomial, Rectangular, x 2 > Normal. In the 
case of the four ratios with the binomially distributed variable as the denomi¬ 
nator, a translation must be made to ensure positiveness of the denominator. 
For the four ratios with the normally distributed variable as denominator, the 
distribution function obtained is approximate; and the approximation is good 
if P{X 2 < 0) is sufficiently small (cf. Geary [5]). 


6. Distribution of the ratio (aiXi+ • • • + a n X n )/(biXi + • • • + b n X n ), with 
denominator positive or negative. The following theorem will be proven: 

Theorem 2. Let G(x) be the distribution function of {a\X\ + • • • + a n X n )/ 
( b\Xi + • • • + b n X n ) where fli, (h , • • • , a„ , bi , b 2 , • • • , b n are real numbers. 
UPffibjXi = 0} = 0, then 

G(x) + G(x - 0) = 1 - <p 
m J 

( 12 ) , 

^ <f) { t(a>i ““ bi x) , • • • , t(a n b n x )} -f" { t{a i b\ x ), * * * , t{an b n x) } ^ 

where 

« + (<i, h ,•••,<„) - // • • • / dF(x , *„), 

<f(ti, tt, • • • , t n ) = j j ■ ■ • J e’ l ‘ lI,+ “' + ‘**" > dF(,x x ,%,••• , x»). 

26 fc * fc <0 



INVERSION FORMULAS 


238 


Proof. Let P,(£) = P{2b k X k > 0} -P\2X k (a k - b&) < { | 2b k X k > 0} 

+ P{2b k X k < 0} -P{2X*(a* - b k x > - { | Xb k X k < 0}. 

Then P x (°°) = 1, P*(— 00 ) = 0, and R x (£) is non-decreasing in £ and continuous 
on the right. Hence P*(£) is a distribution function (Cram6r [2, p. 11]). It can 
be shown by a proof analogous to that used by Curtiss [4] that the characteristic 
function of P*(£) is 

4> + {t(a i — bix), • • • , t(dn — b n x)} -f- <t>~{t(bix — ai), • ■ • , t(b n x — a„)} 
Since P*(0) = G(x ), application of (2) to P*(£) yields the required result. 


6. Inversion formulae for multidimensional distribution functions. The 
n-dimensional analogue of (2) will now be given, and will be applied to obtain 
inversion formulae for the joint distribution of several ratios. 

Let X \, X 2 , • • • , X n have the joint distribution function F(x 1 , x% , • • • , x n ) 
and the corresponding characteristic function <£(/i, U , • • • , O* Let 

0Jl > 72 > • *• > Jft 0l > ^2 } 2 fo) 

be the characteristic function corresponding to the marginal joint distribution 
function of X h , X i2 , • • • , X ik , where the set ji, , • • • , j k is a permutation 

of k of the integers 1 , 2, • • • , n. Note that 

<t>(t 1 j > * • * > O ~ 01, 2, • • • , »(^1 > U t * * ’ , ^n)« 

The summation X Pftiq , £ 21 * , • • • , £»0, which will appear below is 

«i. < 2 , •••,<»> 

to be interpreted as follows: 


Defining £ /<# = £, if ij = 1, 

= £/ - 0 if = 0, 

then X 1 • • • » £»*J will mean that the summation is to be taken 

(*1, <2, * * *, *n) 

over all binary numbers .i\ii • • • i„. 

Using the notation of the preceding paragraph, we can state the following 
theorem: 

Theorem 3. Let A 0 , Ai , • • • , A n satisfy then + 1 equations 


"Z l (” k r ^A T+k « 1, A„ = -1, (r = 0,1,2, ■•,»- 1), 

wAere as usual, denotes the binomial coefficient. 

Then 


(- 1 ) 


»+i 


z fuu, 


> £*»»*) — 


+ X^ 


h<h <•• 


<7* 




•exp{—+ 


+ **£/*) l4w ••**(*! > > 


>fc) 


dii dfe • • • eft* 
^ • • • £* 


( 13 ) 



234 


JOHN GURLAND 


Proof: Since the theorem is already proved for n = 1 (section 3), and since 


// '" / r<< “ Sl+ "‘ +, ' u> 




M* ••• «n 

/ * r 00 r 00 

£ ♦ • • £ sgn ( ji - fe) sgn (a* - £ 2 ) • • • 

- • • sgn (x n - £») , % , 


,*n), 


the theorem could be proven by induction. The result is obtained more quickly, 
however, by noting that it suffices to consider the case of independent 
Xi , X 2 , •• • , X n . 

It may be remarked that if (£i, £ 2 , • • • , £ n ) is a continuity point of 
F(xi , , • • • , x n ) y the left-hand member of (13) becomes 

(_1 ) n+1 2 n F(fi,f t ,... 

and also that differentiation of (13) yields 


(14) 


d n F(( i , £2, • • • , £») 
d£i d£ 2 • • • d£„ 




e H.«.) 0(<i , (, , • • • , tn ) dk dk ■ ■ ■ dt n , 


in every n-dimensional interval in which the integral converges uniformly. 
This agrees with well-known results concerning Fourier inversion formulae. 

An inversion formula for the joint distribution of p ratios 


( hi Xi + an X 2 + - • • + a n t Xn m 

buXi 4- 62 * X 2 4- • • • 4- bniXn ’ 


i = 1, 2, • • • , p (1 < p < n), 


can be obtained from (13) by a method similar to that applied in section 4. The 
following theorem holds: 

Theorem 4. Let 


and 4>(h, tt, ••• , tn) be the characteristic junction corresponding to X\, 
X,, -,X„. Then, ifP{2b it X f <0} = 0 (* = 1, 2, • • • p), 

(-l)** 1 £ G (( Ul , f«., • • ■ , fe*) 


( 15 ) 







INVERSION FORMULAS 


23S . 


The following corollary is a generalization of (10) and follows by differentia¬ 
tion of (15): , 

Corollary. Suppose G(xi , x 2 , • • • , x p ) is the joint distribution function of 
the p ratios 


(lljXi + * • • _+ On, X n 

b\jX i + • • • + bnjX n ’ 

and(f>(ti yUt • • • ,tn)is the characteristic function corresponding to Xi,X 2 , • • • , X n , 
then/ii P{z:^b l ,X i < 0} - 0,j = 1,2, ••• p. 


(16) 


■••&)./iy u.x 

\2 id) jj j 

{§ bnl,n '' - K * 0(4 •••*-- 
in every p-dimensional interval in which the integral converges uniformly. 


7. Joint distribution of ratios of quadratic forms. Let Xi , X 2 , • • • , X n 
have the joint probability density function 

f(x) — ^ ( ]ixBr> 

}( ’ (2*-)"'= 


where x = (xi , x 2 , • • • , x n ) and B is a positive definite symmetric matrix. Sup¬ 
pose Q is a positive semi-definite symmetric matrix of rank r < n and 
Li, h 2 , • • • , L P is a set of symmetric matrices. We wish to obtain the joint 
distribution function G(( i, £ 2 , • • • , £ P ) of the p ratios 

XLiX' XUX' XL P X' 

XQX' ’ AQX' ’ 1 ’ 

where A = (Xi , A' 2 , • • • , X n ). 

The existence of such an orthogonal matrix S that SQS' = / (r) , where / (r) is 
the diagonal matrix having the first r diagonal elements equal to unity and the 
rest equal to zero, is well-known. Let A r = YS, C — SB S', Mi = SLiS', and 
note that C and the A/, are symmetric matrices. Also 


C(fi, £2, 


,fc) 


P 


FAfi r 
YI (r) Y' 


<6 


. FA/p y' 
’ F/ (r) F' 



where Y 


(Yi , Fj, • • • , F n ) has the probability density function 


g(y) = 


(det C)* -* y c V ' 
(2r)-« 


and y « (yi, ft » a " »2/n). 

Suppose the L, mutually commute in pairs. Then so do the Af»; for MiM , = 
SUS'SL,S' = = SL,S'SL t S' = Af ,M,, since 5 is orthog- 



236 


JOHN GURLAND 


onal. Hence, there is an orthogonal matrix U which simultaneously reduces 
each M to diagonal form; that is, N = UMU' is a diagonal matrix (cf. Weyl 
[8, p. 25]). 

Let Y ** ZU, D = UCU', so that 


ZNJJ = ± Ni Z) 
i -1 

0(6 • 

where v/ 1 = 1 if j < r; 

= 0 if j > r, 


vJ;) y* 
^ v i &i 


<fp 


}■ 


and Z = (Zi, Zi , • • • , Z n ) has the probability density function 

_ (det Z)) -\zdm' 

m “ -(Sjw e ’ 

where 2 = (ti, 4 , " - , Zn). 

We can now apply the results of section 6. If 0(h, fe,* • •, t n ) is the character¬ 
istic function corresponding to the joint distribution function of Z\ , Z\ , • • • , Z\ 
it is clear that 


Ml.fe,--, tn) [ det(£) _ . 

where T is the diagonal matrix whose diagonal elements are t\ , U , • • • , 4. Ap¬ 
plying (15), with 0 = 0, we obtain, since G is obviously a continuous distribution 
function 


(-l) P+1 2 p Gfe, & 


(17) 




* 

, 23 ^j(Mny. £y.) 

1 1 


chtfi • • • dw* 
Wi ti>2 • • • w* 


-a + £A, z 44-4 

_ detZ) _ 

det | dafi — 2idafi 23 *“ fy) j 


|* dWjdWj ... 

W%Wi • • ' Wk 1 


where Z) = [d tf d and is the Kronecker delta. 

It is, of course, evident that a result analogous to (17) could be obtained, by 
considering p ratios 




INVERSION FORMULAS 


237 


XUX ' XUX' XL p X f 

XQ x X" XQ 2 X fi *** ’X^X” 

where the 2p matrices L x , L 2 , • • • , L p , Q x , p 2 , - • • , Q p are symmetric and 
mutually commute in pairs, and Qi, Q 2 , • • • , Q P are positive semi-definite. 

In the case p = 1 in (17) and for special classes of matrices L x , Qi, B the cal¬ 
culus of residues may be employed to obtain closed expressions for the distribu¬ 
tion of 


XLi X' 

XQiX' * 

Formula (17) can be applied to obtain the joint distribution of serial correla¬ 
tion coefficients with different lags. The author plans to incorporate these 
results with those mentioned at the end of section 4 in a forthcoming paper, 
written jointly with Roy B. Leipnik. 

The author wishes to acknowledge the valuable criticism of Professor H. Lewy, 
and especially the constructive advice and suggestions of Roy B. Leipnik. 

REFERENCES 

[1] M. B6ciier, Introduction to Higher Algebra , Macmillan Co., New York, 1907. 

[2] H. Cram£r, “Random variables and probability distributions”, Cambridge Tracts in 

Mathematics , No. 36, Cambridge, 1937. 

[31 H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[4] J. H. Curtiss, “Distribution of a quotient”, Annals of Math. StaL, Vol. 12 (1941). 

[5] R. C. Geary, “The frequency distribution of the quotient of two normal variates”, 

Roy. Stat. Soc. Jour., Vol. 103 (1930). 

[6] R. C. Geary, “Extension of a theorem by Harald Cramdr”, Roy. Stat. Soc. Jour., Vol. 

107 (1944). 

[7] M. G. Kendall, The Advanced Theory of Statistics, Griffin Co., London, 1943. 

[8] H. Weyl, The Theory of Groups and Quantum Mechanics, Methuen and Co., Ltd. 

London, 1931. 



THE FACTORIAL APPROACH TO THE WEIGHING PROBLEM 1 


By 0. Kempthorne 
Iowa State College 

1. Summary. The weighing problem is discussed from the point of view of 
factorial experimentation. The paper contains a brief description of the frac¬ 
tional replication of the 2 n factorial system. It is shown that optimum designs 
for the weighing problem may easily be obtained with this approach. This 
approach is valuable in indicating the structure of weighing problem designs, and 
the limited conditions under which such designs can give results of value. 

2. Introduction. Considerable attention has been given recently to the 
problem of weighing a number of light objects on a scale [1,2,3,6]. The problem 
was originally proposed by Yates in his paper on complex experiments [4] as an 
example of a factorial experiment in which interactions between the factors tested 
would not be expected to exist: that is, the weight of say two objects could be 
assumed to be the sum of the weights of the objects weighed separately, after 
taking account of any necessary zero corrections. Such a situation is compara¬ 
tively rare in biological research when, for example, the effect on yield of a parti¬ 
cular crop from the joint application of two nutrients is usually different from the 
sum of the effects of separate applications. In recent years attention has been 
given to the use of fractional replication in factorial experiments [7, 8, 9] and it 
is proposed in this paper to consider the weighing problem from this point of 
view. 


3. The 2" factorial system. A full description of the 2 n factorial system was 
given by Yates in his technical communication The Design and Analysis of 
Factorial Experiments [5]. Yates was particularly concerned with the analysis 
of such experiments and with the evolution of systems of confounding in order 
to reduce the number of plots in each block. The following brief account is 
given in order to facilitate the discussion of the weighing problem. 

In a single replication of the 2 rt system all combinations of n factors each at 
two levels are tested. With three factors, a, 6, c, for example, the following 
eight combinations are tested: (1) a, 6, ab , c, ac, be , and abc , where (1) denotes the 
control, a the application of treatment a only, ab the application of treatments 
a and h, and so on. A set of seven independent comparisons between the eight 
test results is given formally by the expansion of 'the formula 

Ha ± 1)<6 ± 1 )(c =fc 1), 


1 Journal Paper no. J 1548 of the Iowa Agricultural Experiment Station, Ames, Iowa. 
Project No. 890. 


238 



THE WEIGHING PROBLEM 


239 


where at least one of the signs is negative. If, for instance the first sign only is 
taken to be negative, a formal expansion gives the expression 

\{obc — be+ ab — b + ac — c + a— 1}, 

and this contrast of the observations gives the effect of the factor a averaged 
over the presence and absence of the factors b and c, which is denoted by effect A. 
Similarly taking the negative sign in the second bracket only, we get the average 
effect B, 

B = \{abc — ac + ab — a-\-bc—c + b — 1). 

Taking negative signs in the first and second brackets we obtain the interaction 
AB 


AB = \[abc + o + ah + (1) — ac — be — a — &}, and so on. 

The definition of effects and interactions may be presented very simply in 
geometrical terminology, by representing the treatment combinations as points 
of an n-dimensional lattice, each axis of the lattice having two points at unit 
distance apart. The control treatment will have coordinates (0,0,0 , • • • , • • • , 
0), the treatment consisting of a only will have co-ordinates (1, 0, 0, • • • , 0) 
and so on. The effect A is then the difference of the mean yield of the treatments 
corresponding to the points lying on the hyperplane 

xi = 0, 

and the mean yield of those represented by points lying on the hyperplane 

X\ = 1. 

The interaction of two factors a and b , represented by the axes X\ and & respec¬ 
tively, will be obtained from the difference of the mean yields of those plots for 
which 

xi + X 2 = 0, or x\ + X 2 = 2, 

and those for which 

Xl + X 2 = 1. 

The extensions to the above for three-factor and higher order interactions are 
simple. The interaction of factors a, 6, and c , which are represented by coordin¬ 
ate axes Xi , X 2 , and x 3 , is given by the difference between the mean of plots 
represented by points for which 

Xi + X 2 + x 3 = 0 or 2, 

and those represented by points for which 

Xi + x% + Xi = 1 or 3; 

in other words, it is the difference of the mean yields of those plots for which 



240 


O. KEMPTHORNE 


x\ + x% + Xi « 0 (mod 2) 

and of those for which 


x\ + xg + x$ = 1 (mod 2). 

Each effect or interaction is then defined as the mean difference of two sets of 
plots, each set being represented by points on parallel hyperplanes, and the planes 
of one set of parallel hyperplanes lying between the planes of the other set. It is 
necessary to specify only the direction cosines of the hyperplanes in order to 
specify the effect or interaction, and the usual terminology for effects and interac¬ 
tions follows, in that the interaction of factors a, b , c, for example may be repre¬ 
sented by the symbol ABC. 

In the same way as effects and interactions are defined in terms of the yields 
of the several treatment combinations, the expected yield from each treatment 
combination may be expressed in terms of the mean level of yield and the true 
effects and interactions. If the full set of combinations of the factorial scheme 
is tested, the best estimate of each true effect and interaction is the same func¬ 
tion of the observed yields that the true effect or interaction is of the true yields. 
This fact is one of the advantages which follow from the use of the full factorial 
scheme. 

We are not concerned here with factorial experiments in which the factors have 
more than two levels, but when the number of levels of each factor is the same 
prime number, effects and interactions may be represented by products of powers 
of the symbols for the factors. In the case of two factors (a, b ) at three levels, 
for example, the main effects may be represented by A, B, and the interactions 
by AB and A B 2 , each symbol referring to the two independent contrasts between 
three sets, each of three plots. 

As an example of the use of the above representation, we may consider con¬ 
founding, that is, the arrangement of the treatment combinations in blocks in 
order to reduce the experimental error. Suitable arrangements are such that 
contrasts between the blocks represent high order interactions which the experi¬ 
menter is confident will be of negligible size. 

If treatment combinations for which 

«i xi + a 2 X* + •••+«»£» = 0 (mod 2) 

and for which 

A x x + A Xi + • • • + fin Xn = 0 (mod 2) 

are arranged in a particular block, then the coordinates of the treatment combina¬ 
tions in this block also lie on the hyperplane 

(c*l + 0i)xi + (<*2 + fk)x* + * * * + (<*n + &n)x n = 0 (mod 2), 

where the coefficients (ai + A) must be reduced modulo two. If, therefore, the 
treatments are arranged in blocks so that two comparisons are block contrasts, 
then the generalised interaction of these contrasts is also a block contrast. 



THE WEIGHING PROBLEM 


241 


4. Fractional replication. The principle of fractional replication follows 
very simply. Suppose only those treatment combinations whose yields all occur 
either in the positive or the negative part of a particular contrast are represented 
in the experiment, that is only those combinations represented by the points of 
the lattice for which say 

otx %i H“ ct 2 Xa + * • • + a n x n = 0 (mod 2). 

Then the comparison between the yields of those plots represented by 

A xj + A x 2 + ••• + ft*x« = 0 (mod 2) 

and by 

A X\ + fox* + • • • + PnXn = 1 (mod 2) 
will be identical with the comparison between the yields of plots represented 

by 

(aj + A)zi + («2 + A )#2 + * • * + (<*» + Pn)x n = 0 (mod 2) 

and by 

(<*i + A)zi + (<*2 + A )#2 + * • * + (<*n + Ai)#n = 1 (mod 2). 

The former of these two comparisons may be represented by the symbol 
••• xi n , and the latter by xt l+Pl xp^ 2 ••• xl n ^ n , where Xi , • • • , x n 
are no longer coordinates but symbols for the n factors, which satisfy the relations, 
Xi = 1, if a = 0 (mod 2). The equivalence of the two comparisons maybe 
obtained by the use of an identity relationship in the symbols for the factors 

/ = xvxv 

where I is interpreted as unity, and only those combinations whose coordinates 
(xi , & , • • • , x n ) satisfy one of the equations 

<*i x\ + aa x% + • • • + a n x n = 0, or = 1 (mod 2), 

are represented in the experiment. If this identity relationship is multiplied 
by the symbol • xi* by ordinary commutative algebra, reducing the 

powers modulo 2 where necessary, we obtain 

x? l x? 2 .. • *<•**■> . 

It is more convenient to revert to the common use of capital letters A, B,C, etc. 
for effects corresponding to small letters a, b , c, etc. for the factors tested. An 
experiment in half-replicate is then represented formally by an equation of the 
type 

In such an experiment on n factors only 2 n ~ 1 treatment combinations will be 
tested. Of the 2 n — 1 independent comparisons in a fully replicated experi- 



242 


O. KEMPTHORNE 


ment, information on one comparison is lost completely since only those treat¬ 
ments which appear in the comparison with the same sign are represented: 
the remaining 2 n — 2 independent comparisons of a fully replicated experiment 
are identical in pairs giving 2 n_1 — 1 independent comparisons. Each com¬ 
parison is then said to have two aliases and measures the sum (or difference, 
depending on which half of the treatment combinations are used) of two effects, 
an effect and an interaction, or two interactions. 

A quarter-replicated experiment can by the same process be represented by an 
identity relationship of the form 

/ — A ai B^ l C 71 • • • = A a 2 B^ 2 C yi • • • = ... 

It is useful in the evolution of fractional designs to note that the elements in the 
identity relationship form an Abelian group. 

Fractionally replicated experiments are formally identical with confounded 
experiments in that block differences may be regarded as additional factors in the 
confounded experiment. A 2 n experiment arranged in 2 P blocks, for example, 
may be regarded as a 1 in 2 P design of a 2 n+p experiment. Considerable care 
needs to be exercised in the use of fractionally replicated designs, but they have 
been found to be very useful in agricultural and biological research. 

, 5. The weighing problem. The problem of weighing a number of objects 
may be regarded as the problem of the estimation of the effects of a number of 
factors which do not interact. To take a simple case, consider the estimation of 
the effects of factors a, b , and c for which one complete replicate would consist 
of the combinations 

(1) a, b, ab , c, ac, be, and abc. 

Suppose a half replicate design is used, based on the identity relationship 

I = ABC. 

The combinations tested would then consist of either the set {a, b , c, abc) or the 
set {(1), ab, ac, be). If the former set were chosen, the comparison estimating 
the effect A could also be ascribed to the interaction BC, that estimating effect 
B also to the interaction AB, and that estimating effect C to the interaction AC, 
as can be observed by multiplying the identity relationship by A, B, and C in 
turn. If the experimenter is confident that the two-factor interactions are 
negligible, then any effect given by each comparison would be ascribed to the 
main effect. 

6. Discussion of a particular case. We give the derivation of a design for 
weighing a particular number of objects, say ten. Let the objects be denoted 
by o, b, c, d, e, f, g, h, k, l. Then the total number of combinations which could 
be tested is 2 10 , that is 1024, but as we are confident that interactions are negligi¬ 
ble, it is necessary only to estimate main effects. 

A fractionally replicated design must consist of a number 2 P of combinations 
and this will be a 1 in 2 vy ~ p design. A suitable fractionally replicated design 
consisting of 16 combinations will exist if it is possible to evolve an identity 



THE WEIGHING PROBLEM 


243 


relationship for a 1 in 64 design, such that each term in the relationship involves 
at least three letters. A possible identity relationship for such a design contains 
the numbers of the Abelian group obtained from all combinations of the elements 
1 , ABCy CDE , EFG, GHK y ADL , and AFH , with the rule that the square of each 
letter is to be equated to unity. Each possible comparison may then be due to 
any of the 64 effects or interactions which may be derived from this identity 
relationship. In other words, each comparison has 64 aliases: in the case of ten 
of the comparisons, only one of the aliases is a main effect, and for the remaining 
five comparisons the aliases are all interactions of at least two factors. The 
actual design may be written down by finding combinations of the letters which 
have the same number of letters in common with the unit element and the six 
three-factor interactions. These are themselves a group consisting of all com¬ 
binations of unity and four combinations of letters. The sixteen combinations 
with an even number of letters in common with all the members of the identity 
group are found to be the following: 


(1) 

abdefy 

acefly 

bcdly 

abfgkl , 

degkl, 

bcegky 

acdfgky 

fgh, 

abdeghy 

aceghl , 

bcdefghly 

abhkl f 

defhkly 

bcefhky 

acdhk 


The estimation of effects from the results of the sixteen weighings is particu¬ 
larly easy; the weight of object a will be one-eighth of the difference between 
those weighings containing a and those not containing a. There are ten such 
contrasts which estimate the effects, and the remaining five contrasts may be 
used to obtain an estimate of the experimental error. If a 2 is the variance of 
each weighing, the variance of the weight of a, that is, the effect A will be (1/8 + 
1 /8 )a 2 = (1/4)<r 2 . The precision can be increased fourfold in the weighing prob¬ 
lem with a chemical balance by interpreting the absence of each letter as the 
placing of the object in the left hand pan and the presence as the placing of the 
object in the right hand pan. Each effect will then measure twice the weight 
of the corresponding object and the estimated weight of each object will have a 
variance of a 2 / 16, that is, the same precision as if each had been weighed by itself 
eight times in each pan, or sixteen times in all. 

7: General case. The rules by which fractional designs may be constructed 
have been exemplified above and the procedure is simple, though laborious in the 
case of a large number of objects. It does not, therefore, seem worth while to 
enumerate particular designs for the weighing of particular numbers of objects. 
A general procedure in considering the design for a particular problem is as 
follows. Taking the case of a number n of objects, the experimenter should 
form a rough idea of the order of magnitude of the experimental error, <r say, and 
decide what accuracy he requires for his estimates of the weights, a standard 
error say of s. Then if he weighs 2 P combinations of the objects, the standard 
error of the estimate of each weight will be 2~ pl2 <r in the case of the chemical 
balance. This serves to determine 2 p/2 and therefore p, and it is then necessary 



244 


O. KEMPTHORNE 


to design a 2 n experiment of fraction 2 n ~ p . Alternatively, a design of higher 
fraction which can provide estimates may be replicated the corresponding number 
of times. In the case of the spring balance the corresponding standard error is 
2“ <p ~ 1) l2 a necessitating a design of higher fraction. 

Designs of the type described above have some useful properties: 

(1) the design automatically takes care of any bias in the balance, 

(2) the effects or weights may be computed easily as indicated above, 

(3) the effects or weights are uncorrelated, 

(4) all the effects are measured with the same precision, and 

(5) an estimate of the experimental error which is independent of the effects 
may be computed from the results. 

In considering the use for a particular problem of a design like the one discussed, 
it is important to understand completely the structure of the design. Such 
designs may well have real value for the weighing problem, but it is not easy to 
visualize other problems for which they would not give results capable of various 
interpretations. The use of the above designs depends on an assumption that 
interactions between pairs of factors are negligible, and this is generally not the 
case, for example, in biological research work, in which the experimenter may well 
be confident that interactions between three or more factors are negligible. In 
the particular case described in detail, there are only fifteen independent com¬ 
parisons between the sixteen results which will be obtained, and it follows from 
the identity relationship that the comparison giving the effect A, also measures 
the two factor interactions BC , DL , and FH. If therefore the factor a has no 
effect and there is an interaction between factors b and c or the other two pairs 
of factors, the experimenter will conclude that the factor a has an effect. It is 
clear that under these circumstances the experimental results are worthless. 


8. Efficiency of designs. In [2], Mood states for optimum designs such that 
when N weighings are made, the variance of the estimates of the weights are of 
a 2a 

the order of i n the case of the chemical balance and in the case of the 

spring balance, where <r 2 is the variance of a single weighing. As indicated above, 
when N is a power of 2, the fractional factorial designs result in the same 
variances. These designs are similar to those proposed by Kishen [6]. 

Mood dealt with the case in which the number of weighings was restricted, 
for example to be equal to the number of objects, and defined a best design as that 
which gave the smallest confidence region in the p-dimensional space for the 
estimates of the p-weights. 

To take an example for the weighings of 3 objects with a spring balance with 
no bias he suggests the following design: 

/I 1 0\ 

Z = 1 0 1 

\° 1 1 / 

where the rows of the array refer to weighing operations and the columns refer 



THE WEIGHING PROBLEM 


245 


to objects. If the results of the weighings are y x , y t , and y z respectively, the 
estimates of the weights 61 , 61 , and 63 are given by the equation 

/6i\ / 1/2 1/2 — 1 / 2 \ A/A 

61 - 1/2 - 1/2 1/2 I 2/ 2 J 

W \ — 1/2 1/2 1 / 2 / W . 

If <r* is the variance of a single weighing, then the variance of each estimate will 
be [(1/2)* + (1/2)* + (-1/2)V = (3/4)<r 2 : or if N (= 0(mod 3)) weighings are 
made by replicating the above system N/ 3 times, the variance of each estimate 
will be 9<r 2 /4N. The covariance of any two estimates is (— l/4)<r* so that the 
square of the correlation between any two estimates is —1/9. The fractional 
factorial design will yield estimates which have a somewhat higher variance, 
namely 4<r 2 /iV. This increase in precision obtained in Mood’s design has been 
obtained at the expense of obtaining correlated estimates which in addition are 
subject to any bias which the measuring instrument may have. It is doubted 
whether the use of such designs for any practical problem can be justified. 

It is of interest to note that the concept of fractional replication may be ex¬ 
tended to give designs requiring a number of weighings other than a power or 
two. For the weighing of 3 objects for example, a factorial design of fraction 
3/4 could be used: it could consist of a half-replicate based on the identity I » 
ABC, and a quarter replicate based on the identity 

I = A = BC = ABC. 

There is, however, little point in discussing such designs for the weighing problem, 
as their efficiency as measured by the total number of weighings needed to achieve 
a particular degree of accuracy is lower than for the designs described in this 
paper. 


REFERENCES 

[1J Harold Hotelling, “Some improvements in weighing and other experimental Tech* 
niques”, Annals of Math. Stat., Vol. 15 (1944), pp. 297-306. 

[2] A. M. Mood, “On Hotelling’s weighing problem”, Annals of Math. Stat., Vol. 17 

(1946), pp. 432-446. 

[3] R. L. Plackett and J. P. Burman, “The design of optimum multifactorial experiments”, 

Biometrika , Vol. 33 (1946), pp. 305-325. 

[4] P. Yates, “Complex experiments”, Roy. Stat. Soc. Jour. Suppl., Vol. 2 (1935), pp. 181— 

247. 

[5] F. Yates, “The design and analysis of factorial experiments”, Technical Communica¬ 

tion No. 35, Imperial Bureau of Soil Science, Harpenden, Herts, England, 1937. 

[6] K. Kishen, **On the design of experiments for weighing”, Annals of Math. Stat., Vol. 

14 (1945), pp. 294-301. 

[7] D. J. Finney, “The fractional replication of factorial arrangements”, Annals of Eu¬ 

genics, Vol. 12 (1945), pp. 291-301. 

[8] D. J. Finney, “Recent developments in the design of field experiments. III. Frac¬ 

tional replication”, Jour, of Agric. Sci ., Vol. 36 (1946), pp. 184-191. 

[9] O. Kempthorne, “A simple approach to confounding and fractional replication 

in factorial experiments”, Biometrika , Vol. 34 (1947), pp. 255-272. 



MULTIPLE SAMPLING FOR VARIABLES 

By Jack Silber 
Roosevelt College 

Summary* A multiple (sequential) sampling scheme designed to test certain 
hypothesis is discussed with the following assumption: X is a random variable 
with density function P(x) which is piecewise continuous and differentiable at 
its points of continuity. Formulae are derived for the probability of acceptance 
and rejection of the hypothesis and for the expected number of samples necessary 
for reaching a decision. These formulae are found to depend on the solution of 
a Fredholm Integral equation. Explicit solutions to the problem are obtained 
for the case when P(x) is rectangular by reducing the fundamental integral 
equation to a set of differential-difference equations. Several examples are 
given. 

1. Introduction. A multiple sampling scheme is here proposed which is 
based on cumulative sums of random variables. Bartky [1] has developed a 
theory of multiple sampling for attributes when the attribute can take only two 
values with probability (p) and (1 — p) respectively. Formulae are there de¬ 
rived for the probabilities of acceptance and rejection of the null hypothesis and 
for the expected amount of sampling necessary for reaching a decision. In 
this paper the same type of formulae are developed for the case of variable samp¬ 
ling where the underlying probability law for the variable is given by a piecewise 
continuous function for which derivatives exist at its points of continuity. 

The whole theory of multiple sampling is closely related to Wald’s [2] theory 
of sequential tests. The fundamental difference is that in the latter, probabil¬ 
ities of errors of the first and second kinds are assigned, and acceptance and 
rejection criteria derived therefrom, while in the former the problem is solved in 
reverse order. There the acceptance and rejection criteria are assigned, and 
probabilities of eventual acceptance and rejection derived. For different param- 
meter values, these are the probabilities of making errors of the first and second 
kinds. 

The problem presented here is similar to that given by Wald [3] in his paper 
on cumulative sums. In the present paper we waive the restriction that the 
expected number of items necessary for termination of the cumulating process 
be given explicitly as an integer. Since the theory given here is from the point 
of view of multiple sampling, the language appropriate to that theory will be 
used. 

2. The sampling scheme. Let A be a random variable with probability 
density function P(x) which is piecewise continuous. One variate, say x \, 
is selected and if X\ > b, the hypothesis (for example the null hypothesis with 
respect to the mean) is accepted, and if Xi < a, the hypothesis is rejected. If, 



MULTIPLE SAMPLING 


247 


however, a < x\ < 6, another variate x% is selected. In the latter case similar 
criteria with respect to X\ + x% determine whether the hypothesis is to be accepted 
or this method of sampling continued. Or more formally, let 

Sr-Zxi (r= 1,2,3,...), 

t—1 

where the cumulative sums S r are formed sequentially as follows: for any integer 
r the cumulating process is terminated by acceptance of the hypothesis if S r > b 
and rejection if S r < a, but, if a < S r < b another variate s r +i is selected and the 
sum S r+ i formed. The acceptance and rejection criteria are then applied as 
above. No attempt is made here to indicate the choice of the acceptance and 
rejection criteria. 

3. The probability of acceptance. If at the rth unit the hypothesis is neither 
accepted nor rejected, then it must be true that a < S r < b. Let us denote the 
probability that this condition holds by 

(3.1) f Y r (S r )dSr, 

where Y r (S r ) is the probability density function for S, in the above described 
sampling scheme. The probability density function for <S r+J would then be given 
by 

(3.2) Fr+l(«S r +l) = f Y r (S r )P(S r+1 - S r ) dS r . 

•'a 

The probabilities of accepting or rejecting the hypothesis on the rth trial are 
respectively 1 

(3.3) Vr(Sr) dSr , £ K(S r ) dSr , 

and therefore the probabilities for eventual acceptance or rejection are given 
by 

(3.4) ' P A = £ r Yr(Sr) dSr , Pb = 11 f Yr(Sr) dS r . 

The probability that a < S n < b cannot exceed the probability that a < T n = 
x\ + x% + xz + • • • + x n < b on a single sample of n variates, that is Pr (a < 
S n < b) < Pr (a < T n < b). For distributions with positive variance, it can 
be shown that the right member of the above inequality approaches zero as n —► 0. 
Therefore, the process of sampling as outlined above will eventually lead to 
acceptance or rejection of the hypothesis. See Wald [3, p. 284] for a direct proof 
that the probability that the left member of the above inequality holds for 
n * 1, 2, 3, • • • is zero. 



248 


JACK S1LBER 


Consider the linear integral (Fredholm) equation 

(3.5) Y(x) = Yi(x) + X f b P(x - a)Y(a) da, 

Ja 

where Y\(x) = P(x) and assume a solution of the form 

(3.6) F(x) = F,(x) + XF,(x) + x’F,(x) + • • • . 

That solutions, in power series in X, of the Fredholm equation exist when the 
kernel P(x — s) and the function Yi(x) have finite discontinuities is well known 
and the theory has been expounded by several authors. (For example see 
Goursat [4].) If the power series in X is substituted in the integral equation we 
obtain 


F,(x) + XF,(x) + X s F s (x) + ••• 

= F,(x) + X f [F,(s) + XFj(s) + X* Fi(s) + • • -]P(x - s) ds 

J a 

(3 ' 7) = F,(x) + X ^ Fj(s)P(x - s) ds + X* J* Fi(«)P(x - s) ds 

+ X’ / Fj(s)P(x - *) ds H-. 

Ja 

Equating coefficients of like powers of X we see that 

(3.8) ' F r (x) = f F r _x( s )P(x - s) ds, (r = 2, 3, • • •)• 

J a 


This, however, is the probability distribution for £ r , r = 2, 3, ■ • • under our 
sampling scheme, and therefore from the equations, 

(3.9) F(x) = E X r_1 F,(x) = Fi(x) +\f P(x - s)F(s) ds, 

Tmm 1 Ja 


we have that the probability of eventual acceptance for X = 1, is 

(3.10) E f b F r 0S r ) ds r = r F(x) dx. 

r—1 Jo 

Thus our problem of finding a formula for the probability of eventual acceptance 
or rejection of the statistical hypothesis under the above sampling scheme re¬ 
duces to that of finding a solution of a linear integral equation. 

The argument in this section has, of course, been entirely formal. However 
from the general theory of integral equations we know that the series solution 
(3.6) converges uniformly for X < 1/M(b — a) where P(x) < M, since P(x) 
is a probability density function. In equations (3.4) and (3.10) we give solutions 
for X * 1 and of course we assume that M(b — a) < 1 . Since (3.6) is uniformly 
convergent the interchanges of integration and summation in (3.10) and (4.3) 
in the following section are allowable. 



MULTIPLE SAMPLING 


249 


4. The expected amount of sampling. Since 
(4.1) f Y r -i(S r -i) dS r -i 


is the probability that the rth sample will be reached, then the probability that 
on the rth sample, the hypothesis will be either accepted or rejected becomes 


(4.2) 



that is, the first term in this expression gives the probability that no terminating 
decision is made on the (r — l)st sample and the second term gives the proba¬ 
bility that a like decision is made on the rth sample. The difference of the 
two then gives the probability that a terminating decision (acceptance or rejec¬ 
tion) will be made on the rth sample. The expected number of units sampled 
will therefore be 


E = 1 - ff P(x) dx + t r F,_i(S r _i) dS r -i - jf‘ Y r (S r ) 
(4.3) » 1 + £ f Y r(>S'r) dS, = 1+ ft Y r (x) dx 

r—1 Ja Jo r—1 

= 1 + f Y(x) dx. 

*o 


Thus, the amount of sampling expected before a terminating decision is reached 
also depends upon the solution of the integral equation. We proceed to discuss 
the problem when P{x) is given by a rectangular distribution. 


5. Reduction to differential equations when P(x) is rectangular, 
the integral equation 


(5.1) y*(z) = P*(z) + x J* P*(z - t)Y*(l) dt, 


Consider 


where 


(5.2) 



— c<z — a< +c; 

2 — a > c or 2 — a< — c, 


and in the integral equation 

a + a — c<z<b-\~a — c. 

The parameter a is restricted to the values — c < a < c for the following reasons. 
The rejection criterion a cannot be greater than c + a for, if so, rejection will be 
automatic on the first sample. Similarly the acceptance criterion b must be 
greater than —c + a for otherwise, acceptance would be automatic on the first 



250 


JACK SILBER 


trial. If a > c then, rejection can never take place if it does not take place on 
the first trial for in this case all z > 0. Similarly, if a < — c then, acceptance 
can never take place if it does not take place on the first trial for in this case all 
z < 0. Furthermore, in obtaining solutions of the integral equation, we will 
take a to be > 0. This inequality is no real restriction since solutions for nega¬ 
tive a can be obtained by symmetry from the solutions for positive a . 


If we let x = z — a then 

(5.3) Y*(x + a) = P*(x + a) +\ f P*(x + a - t)Y*(t) dt, 


(5.4) F(x) = P(x) + X / P(x - t)Y*(t) dt, 

*o 

where 


(5.5) 

Now let * 

(5.6) 


P(x) 



—c < x < + c; 
x < —c or x > +c. 


t — a, then 

F(x) = P(x) + X [ P(x — 

=pm +x r P (x ~ 

wo—a 


a — 8)Y*(s + a) ds 
a — s)Y(s) ds. 


We have thus transformed our equation to one in which P(x) becomes sym¬ 
metrical with respect to the line x = 0. Furthermore, the probability of accept¬ 
ance becomes 


(5.7) P A = f Y(x) dx , 

Jb-a 

and the expected amount of sampling becomes 

(5.8) E - 1 + f ^ Y(x) dx. 

wo—a 

Also, x now has the following range :o — c < x < b + c. If we now make the 
transformation x — a — s = y, then 

(5.9) Y(x) = P(x) 4* f P(y)Y(x - a - y) dy 9 

Jx-b 


and the following cases present themselves. 

If x — a < — c or x — b > +£, then Y(x) m P(x) } since P(y) » 0. 



MULTIPLE SAMPLING 


251 


If s — 6 < — c < x — a < +c, then 

(5.10) Y(x) = P(x) + | £” F(z — a — y) dy, 
where 

a — c<£<a + c when 6 — a > 2c, 

(5.11) 

a — c<x<b — c when 6 — a < 2c. 

If x — 5 < —c < +c < x — a, then 

(5.12) no = P(x) +YcC Y(x ~ a ~ y) dy > 

where 

(5.13) a+c<x<6 — c and b — c > 2c. 

If— c < x — b < x — a < +c, then 

(5.14) FGr) = P(z) + [ Y(x — a - y) dy, 

ZC J x —6 

where 

(5.15) 6 — c < x < a +c and 6 — a < 2c. 

If — c < x — b < +c < x — a, then 

(5.16) nO = POO + £ f +C Y(x-a- y ) dy, 

ZC Jx-b 

where 

5 — c<z<5 + c when 5 — a > 2c, 

(5.17) 

a + c<:r<6 + c when 6 — a < 2c. 

Transforming back to the variable s, we have for the case b — a > 2c, 

\ !•*—«+C 

F(x) ■= P(x) + £ / no ds for a - c < x < a + c, 

ZC Ja—a 

\ [X—a+e 

(5.18) = P(x) + ~ / Y(s) ds for a + c<x<b — c, 

2C "x—a—c 

= P(x) + A [ h ~ a F(«) ds for 6 - c < x < b + c, 

2 C Jx—a—e 

and for the case b — a < 2c, 

\ rx—a+c 

F(x) «= PM + I no ds for a — c<x<b — c, 

ZC Ja—a 

(5.19) « PM +£ r* n) ds for b-c<x<a+c, 

ZC Ja—a 

= P(x) + f Y(s) ds for a+c<as<6 + c. 

2C Jg—a—c 



262 


JACK 8ILBBR 


In all of the above equations, the integral is a continuous function of x, a, a, b, c 
while P(x) has a discontinuity at x = +c and x = -c, the jump at these points 
being of amount l/2c. The function Y (x) will therefore be such that 


( 6 . 20 ) 


Y(-c + 0) - F(-c - 0) = l/2c, 
F(c - 0) - Y(c + 0) = l/2c. 


If we now differentiate the above sets of integral equations with respect to x we 
obtain the following sets of differential-difference equations for the case X = 1. 
lib - a > 2c, 


Y'(x) 

(5.21) 


lY(.x-a + c) 

i {T(x — a + c) — Y(x — a — c)) 


— Y(x - a — c) 
2 c 


for a — c < x < a + c, 
for a + c<x<b — c, 
for b — c < x < b + c, 


and, if b — a < 2c, 


Y'(x) *= ~ Y(x — a + c) for a — c<x<b — c, 


(5.22) = 0 


for b — c<x<a+c, 


— —— a — c) for a-fc<z<6 + b, 

the derivatives holding for all points except at x = —c and x = -fc. 

Although a technique has been devised to solve the above equations for finite 
a and 6, mathematical difficulties of a computational character are encountered 
when (b — a) is made large. Note that there are only three essential parameters 
in the above problem since c can be taken as the unit of measurement. In the 
technique illustrated by the following examples, a has been fixed as has (6 — a), 
i.e. the solutions shown in the examples below are general only insofar as one 
parameter is concerned. The essential feature of the technique is that the range 
of Y(x) has been further subdivided so as to make its points of discontinuity end 
points of subdivisions of its range, and thus Y ( x ) becomes continuous from the 
right or left in every subinterval of its range. 

6. Example I: b — a = 2c, c = 1, a = 0. In this case x ranges from (a*— 1) 
or (~c), whichever is smaller, to (6 + 1) or (+c), whichever is larger. If 
—c < a — 1, then Y(x) — P(x) for —c < x < a — 1, and if b + 1 < +c, 
then Y(x) = P(x) for b + 1 < x < —c. For x between a — 1 and 6+1 the 
domain of the differential-difference equations is divided as follows, where a 
is now restricted to the interval — 1 ^ a ^ 0. 



MULTIPLE SAMPLING 


253 


(6.1) Y\(x) = bYi+ t (x + 1) where for i = 1, o - 1 < x < — 1, 

t = 2, — 1 < a; < a, 

i = 3, a < a; < 0, 

i = 4, 0 < x < a -f 1; 

Yi(x) = — bYi-t(x — 1) where for i = 5, a + 1 < x < + 1, 

t = 6, + 1 < x < a + 2, 

t = 7, a + 2 < x < +2, 

t = 8, +2<x<o-f3. 

The above are the equations corresponding to (5.21) for the given example. 

Differentiating the equations for i = 3, 4, 5, 6 and making certain obvious 
substitutions we obtain the following second order differential equations, 

(6.2) Y'Ux) = i = 3, 4, 5, 6, 

where the domains for x are as in (6.1). If we solve the equations (6.2) and sub¬ 
stitute in the remaining equations in (6.1) we obtain the following set of equa¬ 
tions, 

Yi(x) = i4< +J sin b(x + 1) — B i+2 cos i(x + 2) + K if i = 1, 2, 

(6.3) Yi(x ) = Ai cos bx + Bi sin i = 3, 4, 5, 6, 

Yi(x) = — sin J(x — 1) + cos }(x — 1) + Ki , i = 7, 8, 

where again the domains are as in (6.1) 

From continuity considerations we have the boundary conditions 

Y\(a - 1) = Y*(a + 3) = 0, Ki(-l) - b = K*(-l), K*(a) - F,(a), 

F»(0) = r 4 (0), r 4 (a + 1) = r 6 (a + i), r 5 (i) = r t (i) + j, 

F«(a + 2) = F 7 (a + 2), F 7 (2) = K«(2). 

These boundary conditions yield certain relationships between the constants. 
The equations so determined, however, do not form a consistent set of linear 
equations in the Ai , Bi , Ki • • • . If we integrate out the equations (5.18), 
sectionally, the following relationships between the constants are obtained. 

Ai = A i+ s sin b — B i+ t cos £, 2?» = £,+2 cos b + B i+2 sin i, % = 3,4, 

/£* = — C4 4 — i4») sin }(« + 1) — (Bi — A) cos i(a + 1) 

= § + ft — + Ki > 

K 7 = i4 3 — A a + K s , K 8 = i4 fi sin J(« + 2) — #« cos £(a + 2), 

(6.4) 

ft = J + ft + ft - ^ 4 + 4* + (^4 - At) sin b(a + 1) 

+ (Bi - Ba) cos i(a + 1), 

^. 4 = A$ ”f- K$ 4" (Ai ~ At) sin }(« + 1) + (ft - Bi) cos b(fl + 1)> 
ft = — As sin | + ft cos ~. 



254 


JACK SILBER 


From these equations it is easily seen that A 4 = A% and Ki = K% = K 7 = K $. 
Furthermore, the following set of consistent linear equations is obtained, after 
several simple manipulations and substitutions. 


{si 


sin £(a + 2) + sin - ■ sin £f A® 

z 


*}• 


— jcos || B s + jcos £(a + 2) + sin | cos B® — 0, 

(6.5) sin £(a + 2) + cos £(a +2) — cos | • Bin A® + jsin B* 

+ jsin £(a + 2) + cos £(a + 2) — cos | • cos jj Be = 0, 


{cos £} At — Bj + {sin £} Be = 0. 


All the other constants can be obtained from the solutions for At , B 3 , B e in 

(6.5). Letting A equal the determinant of coefficients in (6.5) and using the 
relationships (6.4) we obtain the following solutions: 


A = 2 — 2 sin £ — cos £, 

A A 4 = £ {cos £ — cos a/2 • sin £(a + 1)) = A At , 

A Bi = £{sin £ — sin a/2 • sin £(a + 1) + cos £ —1}, 

AA« = £{sin 1 — cos £ 4* cos a/2 • cos £(a 4-2)}, 

(6.6) A Be = £{sin £(a 4- 2) cos a/2 — sin £ — cos 1}, 

AB 8 = £{1 — sin a/2 • sin £(a 4- 1) — sin £}, 

A At — £{cos £ — sin 2 £(a 4- 1)}, 

ABj = £{sin £ 4" sin £(a 4“ 1) • cos £(a 4- 1) —1}, 

AK X = i {cos | sin i(a + 2)| = AK t = AK 7 = A/C„. 

If we now integrate Y(x) t equation (6.3) sectionally, i.e. from the left end point 
to the right end point of each sub-interval of its range and then add up appro¬ 
priate areas, we obtain the following formulae for the probabilities of acceptance 
and rejection and for the expected amount of sampling: 


P H = i {cos £(a 4- 1) 4- sin a/2 — cos a/2 4“ AA^), 
Pa = ^ {2 — cos £ — 2 sin £ 4- sin £(a 4* 1) 


— cos £(a 4- 1) — sin a/2 4- AK*}, 
E *■ ~ {cos a/2 — 2 sin a/2 — sin £ (a 4* !)}• 


(0.7) 



MULTIPLE SAMPLING 


255 


7. Example II: a * 1, c - 3, 6 - a = 4. In this case, as in the previous one, 
Y(x) = P(x) for — 3<z<a — 3 when o-c = a- 3< -3 and if b + c ® 
a + 7 < 3 then Y(x) = P(x), a + 7 < x < 3. For a - 3 < s < a + 7where 
a takes on only integral values between -5 and 3, we have the following set of 
differential-difference equations: 

y'a+,<x) = iY a+ i+t(x + 2), i = -3, -2, -1, 0; 

(7.1) = 0, j- 1,2; 

~ iF a +/-4(x — 4), y = 3, 4, 5, 6. 

If we integrate the above equations for j = 1, 2, substitute in the equations for 
j = —1, 0, 5, 6, integrate, and then substitute in the remaining equations, we 
obtain the solutions 


Y a+i (x) = A a +j+i(x + 2) 2 + J^4 0+/+I a: + , j — —3, —2; 

® "f" A a + } j — —1, 0; 

(7.2) - A a +j j - 1,2; 

= — ifaAa+j-t(x ~ 4) 2 — Ji4a+7-4 x + -4 a+ y , j = 3, 4; 

= — ii4 a+; ^i a: + -4a+y, j — 5, 6. 


As in the previous example we now use (5.22). Integrating out (5.22) sec- 
tionally, certain relationships between the A a + } - , j = —3, —2, • • • 6 , are ob¬ 
tained. These yield 

Aa+ 1 = *{12/V> + 12P. + 39P«+, + 9P.+*), 

= *{12P a _, + 12P 0 + HPo+i + 37P a+2 |, 


A„-i -£ {4P.-1 + 4P a + 13P 0+ i + 3P 0 +i} 

5o 

(7.3) + tts f 228P«_i + 60Pa + 55P a+1 + 17P.+,}, 

X. « - i| g j 12Pa-l + 12P a + llPa+1 + 37P«+j) 

+ T^|00Pa_, + 228P. + 55P„+i + 17P„ + i), 

where P„+y is the value of P(x) for a + j < x < a + j + 1, j = —3, • • • 6. 
All of the other constants can be found in terms of those given in equations 

(7.3) . If we now integrate (7.2) sectionally and perform several simple manipu¬ 
lations, we arrive at the following formulas: 

n n , 9 a — 5 A ,3a — 1 A , 3 A 1. 

1 * “ ^ 216 - A “ +1 ^ 216~ 12 ' 4a ~ 1 12 Ac ' 

(7.4) Pa = 2 £*.+/ + ~~2i Q ~ ^ a + 1 + 216 ~ ^ 1 + A-4.-1 + A-4«» 

E = 1 + ^ — A»+i + ^ + -4o+i + A a . 



266 


JACK SILBER 


Although P«+j ,j = —6, —6, —4, 7, 8, 9, have not appeared in previous deriva¬ 
tions in this example, they have been inserted in the above formulas to cover the 
cases in which a — c > — c or b + c < c. 

It should be mentioned that Kac [5] obtained the distribution of n (the ex¬ 
pected amount of sampling) by a process similar to that given in this paper. It 
is also interesting to note that the present paper could have been written entirely 
in the language of problems in Random Walk. 

The author has also worked on the case in which the distribution P{x) is tri¬ 
angular and parabolic. In these, as in the case of the rectangular distribution 
discussed in, this paper for b — a large, the equivalent differential-difference 
equations are of large orders making the computation of solutions extremely 
tedious. As Kac [5] pointed out, the task of obtaining solutions in closed form 
for the case when P(x) is the normal law is extremely difficult. 

REFERENCES 

[1] W. Bartky, “Multiple sampling with constant probability”, Annals of Math. Stat ., 

Vol. 14 (1942), p. 363. 

[2] A. Wald, “Sequential tests of statistical hypotheses”, Annals of Math. Stat., Vol. 16 

(1946), p. 17. 

[3] A. Wald, “On cumulative sums of random variables”, Annals of Math. Stat., Vol. 16 

(1944), pp. 283-284. 

[4] E. Goursat, Cours d’Analyse Mathematique, Gauthier, Paris, 1923, Vol. 3. 

16] M. Kac, “Random walk in the presence of absorbing barriers”, Annals of Math. Slat., 
Vol. 16 (1946), p. 62. 



ON THE CHARACTERISTIC FUNCTIONS OF THE DISTRIBUTIONS OF 
ESTIMATES OF VARIOUS DEVIATIONS IN SAMPLES 
FROM A NORMAL POPULATION 

By M. Kac 
Comdl University 


1. Summary. An explicit formula for the characteristic function of the 
deviation 

«> o, 

n *_i 

is derived for samples from a normal population. For a = 1 one can calculate 
the probability density function but the result does not seem to be in complete 
agreement with a recent formula of Goodwin [1]. 

2. Introduction. Let X l} X 2 , • • • , X n be independent, normally distributed 
random variables each having mean 0 and variance 1. 

Let, as usual, 

X „ *i +* » + •■■ +X n 

n 1 

and denote by F»(a) the deviation 

(1) = \X k -X\ a , *>0. 

n jt-i 

The purpose of this note is to show that 


F n (Q « E{exp («K n (a))} 


= Vniv^r 1 L [I . e_l1 


■*/2 e *Vn({|*|«-hi7z) 


"In 

: drj. 


It is easy to check that for a = 2 one obtains the well known expression 

(i _ 

Moreover, if a = 1 one can actually find the probability density of F n (l). The 
resulting expression is fairly complicated and it strongly resembles an expres¬ 
sion recently obtained by Goodwin [1]. Except for the relatively simple case 
n = 3, it does hot seem easy to verify that our formula is equivalent to that of 
Goodwin. 

Although deviations corresponding to values of a different from 1 and 2 are of 
little practical value the explicit formula (2) may be of some interest. It is 
also hoped that the method of derivation may prove useful in other cases. 

267 



258 


M. KAC 


3* The derivation of (2). We start with the observation that 

X and Y n (a) 

are statistically independent (see e.g. Daly [2]). 

Denote by 

E*{ | X | < €, exp (if Y n (a))\ 


the integral of exp (if F n (a)) extended over that portion of the sample space in 
which | X | < «. Thus the conditional expectation 2?{exp (if F n (a)) | | X | < «} 
is given by the formula 


JSfexp (ifX,,(a)) | | £ | < «} 


£*{ I £ ] < e, exp (if Fn(a))| 
Prob { | X | < <) 


Because of the independence of X and F»(a) we have 


(3) 


J?{exp (ifF n (cc ))} 


g*{ | X | < t, exp (ifF„(a))} 
Prob { |X| <«} 


For the sake of simplicity we assume now that a > 1 and note that 
exp (ifF »(<*)) - exp L | X k |“) | < i £ | | X k |“ - I X k - X |“| 

< ?tL*] £ (| Xk | +1 x i 

n i 

Thus, on the portion of the sample space where \ X \ < e, we have 

exp (ifF»(<*)) - exp (% t I X t A I < ^ ± ( | X k \ + .)-> 
\n i /I n i 

and consequently 


£*{ I X | < exp (ifF »(<*))} -E*{\X\<e,exp(g±\X> | a )J 




I * I < «, E ( I X* I + «)“ 


I 


Clearly E* | | X \ < t. T*. (I Xt I 4- «)“ 1 |, approaches 0, as « approaches 0, 


hence by (3) 


(4) 


E {exp (ifF»(<*))} 


r _ 1*1 <«,exp(!± I*)} 

“ Prob { | X | < «} 



CHARACTERISTIC FUNCTIONS 


259 


Using the fact that 


. . = i, i * i <«, 

l / ^ exp (ir,X) dr, = i, I X I - «, 

IT J-oo i; 

= 0, | X I > «, 


we obtain easily 

1*1 < «, exp(| £ I r)} 

- \ r — ^{explZ (|| X k r + vX") ) dr,. 
* v { n i ) 

The justification of interchanging of the order of integration (from - 00 to 00 ) and 
the operation E can be made quite simply (see e.g. Kac and Steinhaus [3]). 

Notice now that 



{ exp i±({|X*r +„*>)} 

= /«, exp (“ l) exp % n ^ I X <&] = ^(f. v) 

and that *> n (£, y) is absolutely integrable in (— 00 , oo)asa function of y. 

Thus, as € —► 0, 

(6) i f #>n(£, rj) dy ~ - f <p n (£, y) dy. 

T v u " 00 ^ w ■»00 

Furthermore (as t —* 0) 

(7) Prob { | X | < « ) ~ 2* 
and combining this with (6), (5) and (4) we get 

(8) tfjexp (ifFn(a))) = ^ d * 

This, of course, is equivalent to (2). 


4. Density function of the mean deviation. If a = 1 one can obtain an ex¬ 
pression for the probability density f n (P) of F n (a). We note first that 

exp exp i (f | x | + yx) dx 

** n l exp (" eXp + ^ z * 


+ n 




exp i(£ — y)x dx = n[R(( + y) + R(( — 17 )} 



260 


M. KAC 


R(u) * J exp ^ — exp (iux) dx. 


Using (2) (with a = 1) we obtain 


fn(f) _ v^(V 2 ;) n+1 1. [Is CO fii(f + 

Let us first look at the summands corresponding to k = 0 and k = n. We have 

R n (t - v )d v = f «"(,) d„ = f fl"({ + ,) d,. 

J— 00 J— 00 J— 00 

Now, /2(iy) is the Fourier transform of 


0, z < 0, 


(-•¥)■ 


x > 0, 


and hence # n (i;) is the Fourier transform of the convolution 

r * r * • • • * r = f tB) (*). 


It is easily seen (using integration by parts) that 


-(m) 


for large 1 ij | and hence for n > 2, R n (v) is absolutely integrable in ( — oo, oo). 
It follows (classical inversion formula) that 

J R n (ri' drj = 2irf (n> (0). 

Since for n > 2, f <n> (z) is continuous and f <n) (z) = 0 for x < 0 we 
have f <n) (0) = 0. Thus 

«»- (7S5* I! 0 £ «*« + - -»*• 

It is fairly easy to check that 

£ + ,)«"-*({ - ,) d, = t £ exp (j) f'""* 1 dx 


so that 


Fn(f) “ (V^) B+1 1- exp { * x) £ C0 fW © f<n_M (i) dx ' 



CHARACTERISTIC FUNCTIONS 


261 


Finally, 



I have not been able, except for n = 3, to verify directly that this formula is 
identical with that of Goodwin. 


REFERENCES 

[1] H. J. Goodwin, “On the distribution of the estimate of mean deviation obtained from 

samples from a normal population,” Biometrika , Vol. 33 (1945), pp. 254-256. 

[2] J. F. Daly, “On the use of the sample range in an analogue of Student’s <-test,” Annals 

of Math. Stat., Vol. 17 (1946), pp. 71-74. 

(31 M. Kac and H. Steinhaus, “Sur les fonctions ind4pendantes III,” Stud. Math., Vol. 
6 (1936), pp. 93-94. 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


A FUNCTIONAL EQUATION FOR WISHART’S DISTRIBUTION 

By G. Rasch 

State Serum Institute and University of Copenhagen 

1. Introduction. The sampling distribution of the moment matrix for ob¬ 
servations from a multivariate normal distribution was given by Wishart in 
1928 [1]. This proof involved rather advanced multidimensional geometry but 
since then two analytical proofs have been given: one by Wishart and Bartlett 
in cooperation with Ingham by the use of the characteristic function [2] and a 
second by Hsu by induction with regard to the dimension of the observa¬ 
tions, [3], 

In the following section is given a new derivation of the form of Wishart’s dis¬ 
tribution in which a fundamental property of the multivariate normal distribu¬ 
tion is utilized, viz. the invariance of the distribution type against a linear trans¬ 
formation. In section 3 the same principle is used for evaluation of the constant 
and determination of the moment matrix in the multidimensional normal 
distribution. 


2. Derivation of Wishart’s distribution. Let 1 


( 1 ) *-(*!,•••, *k), 

denote a A;-dimensional normal variate with the mean vector 0 and the distribu¬ 
tion matrix 


( 2 ) 

viz . 


(3) 


* = 


v M 


VA($) _ Hx*x* 


$ is symmetrical and positive definite. 

Now consider n observations of x: Xi, • • • , x„ , which are stochastically in¬ 
dependent. Their joint distribution is 


(4) 


Pl*i, ••• x„| 


UVS)V 


The estimation of $ is based upon the moment sums 


mu = Zxw*,,, 

1 Notations: Lower case latin and greek letters are scalars; boldface capital latin and 
greek letters denote matrices, and boldface lower case letters row vectors. * means trans¬ 
position. A (A) stands for the determinant of the square matrix A. 

262 



THE WISHART DISTRIBUTION 


263 


which form the symmetrical, positive definite matrix 

(5) M = (mij) = 2x*x,. 

In order to derive the distribution of M the straightforward procedure seems to 
be to transform the distribution of the sample (xi, • • • , Xn) to a distribution of 
M and some other variables which then should be integrated away. As such, 
the transformation, 

(6) x„ = u,M*, MSuJu, = 1, 
might serve. The matrix 



contains nk elements linked together with ^ relations; (U) symbolizes 

z 

— ~ ^ k of the elements taken as independent variables. 

For the purpose of introducing M in the exponential term in (4) we shall 
define the “double dot multiplication” of two matrices: 


( 8 ) 


A • • B (®t|) ' * (^tj) — ^ ^ ^ > 

«) C i ) 


for which we notice the rule 


(9) A • • (BCD) = C • • (B*AD*). 

As obviously 

x$x* = '2<pi j XiXj — • • (x*x), 


we have 


(10) Zx&x? = • • M, 

and accordingly 


(ID 


V !M, 





d(Xi , • • • , X n ) 

"d(M, (U)) 


where denotes the jacobian of the transformation. 

a( ) 

respect to (U) wc obtain 

(12) p{M) = (VAW)" • 


On integrating with 


where <p(M) is independent of 4>. From this it follows that p{xi, • * • , x n | M} 
is independent of $, i.e. M is a sufficient statistic for 4>. 

In order to determine the mathematical form of <p(M) we shall apply an arbi¬ 
trary linear transformation to the original variates: 

(13) x, = x,'A. 



264 


Q. RASCH 


The new variates are obviously normally distributed about 0 with the dis¬ 
tribution matrix 

(14) = A<f>A*. 

Therefore the distribution function of the new moment matrix, given by 

(15) M = A*M'A, 
is 

(16) p{M'j = (VMi 7 ))" • 

On the other hand the transformation from M to M' is a linear one, the jacobian 
of which therefore is a constant depending on A only: 

07) $jj^ = t(A), say. 

Consequently, 

(18) p{M'} = Vm • • *(M) • |*(A) |. 

The two expressions for p{M'} must be identical, and as 

(19) A(*') = A(4>)A 2 (A), 

and 

(20) • • M' = (A3>A*) • • M' = (A*M'A) • • $ = M • • $, 
it follows that <p(M) satisfies the functional equation 

(21) I A(A) I VMO = *<M) • l*(A)|. 

Now, since the transformation M = (AB)* M'(AB) may be carried out in two 
steps, ^(A) also satisfies a functional equation 

(22) *(AB) = *(A)*(B). 

Furthermore, if A is a diagonal matrix it is easily seen that 

(23) *(A) = (A(A))* +1 , 

and this relation holds generally. In fact, considering the case where the normal 
form of A is a diagonal matrix: 

A = TDT -1 , say, 

we’ get 

*(A) = ^TWDWT- 1 ) 

= (A(D)) t+1 ^(TT -1 ) 

= (A(A))* +1 , 

and by analytical continuation this is seen to be true for any A. 



THE WISH ART DISTRIBullON 


265 


Now, Inserting this result in the functional equation (21) and taking for A the 
real symmetrical square root of M so that 2 M' = 1, we readily obtain the 
solution 

(24) ?(M) = (A(M , ))”-‘- 1 ^(l). 

It follows that 

(25) p{ M} = y*(n)(A(*)) n/2 • e*"* • (A(M)) (n -^ 1)/2 , 

where y *(n) = #>(1) is a constant which may be determined in various ways (cf. 
for instance Cramer [4]). 

3. Other applications of the linear transformation. It may be noticed that 
the linear transformation also leads to simple derivations of two fundamental 
properties of the normal multivariate distribution itself, viz. determination of 
the constant and the relation between the moment matrix and the distribution 
matrix. 

Let 

(26) p{xj = v(*) • 

and transform by 

(27) x = x'A. 

The new variable obviously has the distribution matrix (14) and the constant 
y(&). But on the other hand direct transformation of (26) leads to 

p{*) - 7 (*) • | 

-7(*)|A(A)|e“ tei “\ 

and therefore we must have 

7(4>') = 7(*) I A(A) | . 

For A = 4T 1 we get 4>' = 1 and consequently 

7(40 = VA(4>) • 7(1), 

where obviously 

7(1) = (vfer 

Considering 

M(3>) = J x*xp{X} dx, 

* Exists because M is positive definite: Let M ** ODO* where O is orthogonal and D 
the diagonal form of M; then M* OD*0* is real and symmetrical. 



266 


HERBERT ROBBINS 


the same transformation gives 

M(^>) = J A*x*xAp(x'| dx' f 
= A*M($')A 

which for A = leaves us with 

M($) = 

because M(l) = 1. 

REFERENCES 

[1] J. Wishart, “The generalised product moment distribution in samples from a normal 

multivariate population”, Biometrika , Vol. 20A (1928), p. 32. 

[2] J. Wishart and M. S. Bartlett, “The generalised product moment distribution in a 

normal system”, Proc. Cambr. Phil. Soc. f Vol. 29 (1933), p. 260. 

[3] P. L. Hsu, “A new proof of the joint product moment distribution”, Proc. Cambr. Phil. 

Soc. t Vol. 35 (1939), p. 336. 

[41 H. Cramer, Mathematical Methods in Statistics , Princeton Univ. Press, 1946, pp. 392-93. 


THE DISTRIBUTION OF A DEFINITE QUADRATIC FORM 

By Herbert Robbins 
University of North Carolina 

Let X \, • • • , X n be independent normal variates with zero means and unit 
variances, let a x , • • • , On be positive constants and define 


(i) 


u n «+ ••• 

F.(x) - Pr [U n < x], /„(*) = FUx). 


( 2 ) 

Setting 

(3) o= (ox---on) 17 " 

and using the convolution formula we may show by induction that for x > 0, 

(4) 


( 5 ) 

where for k « 0,1, • • • 

(6) c k « r 


fnW a x • ; r(i« + k) ’ 
fclo r(Jn + k + 1) 


til • • • inla} 1 * • •ai" 



DISTRIBUTION OF A QUADRATIC FORM 


267 


In particular, if oi «= • • • = On = 2, then using the known distribution of x* with 
n degrees of freedom we have 


/«(*) 




(-*)* 


a*(-*)‘ 


2‘»r(Jw) 2 * n r( 4 n) ifo 2 k k\ 2 *" uorfjft + t)' 


so that 


3. _ fc ) _ «•"*" v rfe + *)••• r(t„ + i) 

2*A!r(in) 2* ’ 


and therefore 
(7) 


E 

<1+* * •+<«-•* 


r(ti + })--r(t, + } ) = x tn rQw + k) 

til •••in! &!r(£n) 


Now in the general case let 

(8) a. = min {«!,-•• , On}; 

then from (6) and (7) we deduce that 


(9) 


c k (-x) k 

r(Jn + k) 


(x/a) k 

rCin)/*:!’ 


with a similar inequality for the general term of (5). 

It is difficult to evaluate numerically the coefficient c* by a direct application 
of the definition (6). We shall therefore give a method by which the c k may be 
computed easily. We shall assume in what follows that the a, are distinct. 

Let Y \, • • • , Y n be another set of variates with the same joint distribution 
as the Xi and independent of the X { , and set 


(10) V ln = ^xl + ... +| n X 2 „ +^Yl + ... +^Y\, 

(11) <?>„(*) = Pr [7,» < x], g in (x) = Gi.(x). 

Then by the convolution formula, 

(12) gUx) = l Ux - y)Uv) dy = g jg • 

But we may show directly that, setting 

(13) = II («, — «*) -1 (i =!, ■ ■ ■, »), 


we have 


(14) ff,.(z) - (-1)"- 1 1, qtar'e-*"' = (-I)"" 1 Eli; gior*"*) { -£ • 
Equating coefficients in (12) and (14) we derive the fundamental formula 

(15) £ <nct-i = a" E q>o7 ll+t) (fc - 0,1, • • •). 

4.0 4.1 



268 


HERBERT ROBBINS 


Thus, defining 

(16) 2P k = a" 9.o7'* +1) , 

•-1 

we may write 

(17) Ec,Ci-, = 2P i . 

t _0 


From (6) or (17) it follows that 

(18) Co = 1. 


Thus we may solve (17) successively for the c* in terms of the P*: for j = 0,1, 


JO 


(19) C ' 2i ~ ^ | ri Ci J~ l + ° 2 C2 >- 2 + ” * + c i i ^ | 1 

C 2/+1 = f* 2 j+l — {CiCzi + C 2 C 2 /—1 + • • • + c,c, + i}. 


When the n constants q \, • • • , q n have been computed we may compute the 
Pit by (16) and then the c k by (19) successively as far as desired. The inequality 
(9) gives an indication of the number of terms of the alternating series (4) or (5) 
which are necessary to secure a desired accuracy. A sharper result than (9) 
should certainly be possible when the a, are well separated. 

The foregoing method may be modified to cover cases in which some of the 
a< are equal, the formulas (16) being replaced by the appropriate limits as the 
o, approach equality. 

We shall now derive an expansion of f n (x) and F n (x) in terms of x distribu¬ 
tions. Let us set for a; > 0, 


( 20 ) 


1 -xla 


/.(a-) g ( l) d„ ain+£ r(in + k) , 


or, equivalently, 

a \ A 4 . x in+k " 1 e~ ix 
2 X ) S * dk 2*" 11 r(Jn + k) 

^-v 1, f. , . tt*rr(4n) 

2*» r(j«) s. K r(jw + k) 

where the dk are to be determined. It follows, after some reduction, that 

(22) »„(*) = a~" x"' 1 e~* u r; . 

k—o ( <-o J T(Jn + A:) 



But we may write (14) in the form 



DISTRIBUTION OF A QUADRATIC FORM 


269 


Equating coefficients in (22) and (23) and setting 


(24) 


2 % = «E <?i07 ( * +n (a - a,)" +i ~ l . 


we obtain the relations do = 1 and 


(25) 


X= 2 Q, 


= 0, !,•••), 


from which the d k may be computed as in (19). Equation (20) or (21) then 
gives the expansion of f n (x) in a series of x* frequency functions. The corre¬ 
sponding expansion of F n (x) is then 


( 20 ) 


FuU) = X (-1)' (h ■ [' , 

0 Jo w 


.Jn+i- 


-tla 


‘" +i r(Jrt + /;) 


dt. 


or 

(27) 



90 /.X Al 

= 5 ' l 2*" H 


«+*-l e -tl 2 

K r(in + A*) 


<//. 


By direct comparison of (4) and (20) we may establish the following relations 
among the c k and (h : 



These may be used if both the power series and the x* series are desired. 
From (6) we see directly that 

(29) ft - s Z «r\ 

& t -1 


and from (28) it follows that 

(30) (h = l anbi , 
where we have set 

(31) 6. = | i £ «7 l 1 - (a, • • • aJ~ Um . 


Since by a well known inequality hi > 0 it follows that di > 0, with equality only 
if all the a : are equal. If we denote by h n (x) the frequency function of 
ia(Xf + • • • + Xl) then 



a-*"" 1 e~ lx 
2»»T(i n)’ 


(32) 



270 


HOWARD L. JONES 


and hence (21) becomes 


(33) 


fn(x) 

kn(x) 


which is significant for x near 0. 


1 — bi x + • • • , 


EXACT LOWER MOMENTS OF ORDER STATISTICS IN SMALL SAMPLES 
FROM A NORMAL DISTRIBUTON 

By Howard L. Jones 
Illinois BeU Telephone Company 

1. Summary. Exact means in samples of size < 3, and exact second moments 
and product-moments in samples of size < 4, are given in Table 1 in terms of 
ir for order statistics selected from the normal distribution N( 0, 1). The deriva¬ 
tion employs multiple integration and some general properties of the moments. 

TABLE 1 


Expected values of lower moments of order statistics , Xi > x i+ i , 
in samples of size n from the normal distribution N(0 , 1). 


Moment 

n = 2 

n = 3 

n - 4 

E\ Xl ) 

1/V» 

3/ (2 Vlr) 

j 

E[x t ] 


0 

1 

E[xW 

1 

1 + \/ 3 /( 22 r) 

i + V3A 

£1*2] 


1 — a/3 /tt 

i - V3A 

E[xixf\ 

0 

VS/(2r) 

VB/r 

E\xix»\ 


-Vs h 

-(2V3 - 3)/t 

Ejxixt] 



—3/tt 

E[ Xt x,] 



(2V3 - 3 )/w 

2 
<7 1 

1 - 1/tt 

1 - (9 - 2V8)/(4t) 


2 
<f 2 

1 

1 - Vb/t 


<712 

\/v 

V8/(2») 


<713 


I (9 - 4V5)/(4t) 



2. Introduction. The usefulness of the lower moments of order statistics for 
determining the moments of the range and for other purposes is well established. 
In small samples, however, computation of the moments by quadrature is labori¬ 
ous [1]. The values shown in Table 1 should therefore be helpful in problems 
requiring the use of these moments for samples of size < 4, since the constant t 
has been evaluated to several hundred decimal places. Some of the methods 
used to obtain these results may also be useful in approximating or verifying 
the moments in larger samples. 





MOMENTS OP ORDER STATISTICS 


271 


3* Multiple integration. Let n random selections from the normal distribu¬ 
tion N( 0, 1) be arranged in order of size so that 

xi > x 2 > • • • > x n . 

For samples of size 2, the means and product-moment are easily obtained from 
the general formula 


Mx)] -nl/Tf ••• fxlx) 

•'-co J *n J *n-\ 

f(xi)f(x 2 ) • • • /( X n ) dx i dx 2 ' * ’ dXn 


^ = vr* e ~ ix ‘’ 

E [x<] being the special case where h = 0. Multiple integration can also be 
used to find any product-moment, E [xiX Jf ti\, for samples of size 3, the order of 
integration being changed at any stage where necessary. 

For the means in samples of size 3 and the product-moments in samples of 
size 4, the integrals reduce to double integrals which can be evaluated from the 
equation 


r r <ui dt, 

J-oo Jt , 2 ab 


This equation follows from the fact that 


is equivalent to 


while the function 


f f dhdt, 

j -00 Jbt 2 la TT 

j f o £ djh dp,, 

pbt 2 ta 

I dh 


has the symmetrical property that <t> (k) == —0(—k), whence 


f dh = 0. 

•'-OO 


4. Some properties of the moments. The most obvious property of the 
moments of order statistics in samples from the normal distribution N( 0, 1) is 
their symmetry; thus: 

E [#*•] = —E [x n _* + i], 

E[x\] - E [*„_,+!], 

E [xiX,\ = E [in-i+iaJ.-z+J. 



272 


HOWARD L. JONES 


When sample values from any parent distribution are numbered in order of 
random selection, Xi and are statistically independent of each other, and 
the expected value of a product xlx) is the product of the expected values of 
and x y. Numbering in order of size has the effect of increasing some expected 
values and decreasing others, leaving the sum of expected values of a given type. 
unchanged, so that in general, 

E E (MxU)}) = (i)E[M*i] 

<«i j-i+i W 

where x 0 is a random selection. In particular, this equation holds for the special 
cases (k = 1, h = ]),(&= 1, h = 0), and (h = 2, h — 0); so that in samples 
from the normal distribution N( 0, 1), 

E E (E[x,x,]) = \n{n - l)(/?[:r<,]) 2 = 0, 

<-i j-i+i 

E (E[xil) = nE[x ol = 0, 

l-l 

E (A’tx;]) = nE[xl] = n. 

t-1 

The foregoing relationships lead immediately to the evaluation of E [xixj] and 
E [x\] in samples of size 2. (The generalization of these relationships was sug¬ 
gested by Professor John H. Smith, whose unpublished manuscript on sampling 
from a rectangular distribution has also been instructive.) 

In samples from a normal distribution, the covariance of every order statistic 
with the sample mean is the same as the variance of the sample mean. This 
implies that the variance of the sample mean < the variance of any order sta¬ 
tistic, the ratio of one standard deviation to the other being equal to the co¬ 
efficient of correlation between the sample mean and the order statistic. To 
derive these properties, consider the linear function 

m = WiXi + W 2 X 2 + • • ■ + w n X n 

of the order statistics Xi , X 2 , • • • , X n in a sample selected from the normal 
distribution N(/x, a) with unknown /x and <r. Let 

Xi = (Xi - m)/*, i = 1, 2, • • • , n. 

The conditions u>i + w 2 + • • * + w n = 1 and io n _»+i = are sufficient to make 
m an unbiased estimate of /x with variance a E [(wiXi + w &2 + • • • + w*#*) 2 ]. 
The w’s that make this variance minimum must satisfy the equations obtained 
by replacing Wi with w n -%+i , for i > £(n + 1), in the expression 

E[(w X Xi + W 2 X 2 + • • * + WnXn) 2 ] + X(^i + W 2 + * • • + W n - 1) 

and then setting the partial derivative with respect to each w equal to zero. 
This leads to 

n n 

2 u>iE[XiXj] + X WjElx^i+iXj] -f X = 0, 
j-i . i-1 


1 < i < n, 



ASYMPTOTIC EXPANSION 


27a 


where the summations include the terms E[x\] and E[x n - i+l 2 ], respectively. But 
it is known [2] that the sample mean is the regular unbiased estimate of /u with 
minimum variance. Setting each w equal to 1/n and combining equivalent 
terms yields 


2 E[x x x,\ + $nX = 0, * = 1, 2, • • • , n. 

7-1 


Summing from i = 1 to i = n, and employing the relationships discussed in the 
preceding paragraph, we obtain 


whence 


and 


n + \n \ = 0, 


X — —2/71, 


£ fiV.-r,] = 1, i - 1, 2, • • • , n, 

7—1 

where the summation includes the term E[x% This equation leads to the prop¬ 
erties mentioned at the beginning of this paragraph. The same equation can 
be used to evaluate E[x\\ and E[x\\ in samples of size 3 or 4 from the distribution 
N{ 0,1), after the product-moments have been found. 


REFERENCES 

[1] C. Hastings, Jr., F. Mostelder, J. W. Tukey, and C. P. Winsor, “Low moments for 
small samples; a comparative study of order statistics,” Annals of Math. Stat., 
Vol. 18 (1947), pp. 413-426. 

[2J H. Cram£r, Mathematical Methods of Statistics, Princeton University Press, 1946, p. 483. 


NOTE ON AN ASYMPTOTIC EXPANSION OF THE nTH DIFFERENCE 

OF ZERO 

By L. V. Hsr 

National Tsing- Hua University , Peiping , China 

This note gives an asymptotic expansion of the nth difference of zero. It is 
known that the Stirling number S n ,« of the second kind is defined by 


( 1 ) 


= a" o* = 

x*>0 v*v 


We shall first show that the Stirling number 8 n ,n+k can be expanded in the 
form 

(2) = J 1 -/ [ 1 + / i (fc) + / ^ ) + • • • + 0 < k) 

2* • A: IL w n 2 n* J 



274 , 


L. C. HSU 


where fi , f %, • • • , ft are polynomials in k and whose coefficients can be found 
by means of the following lemmas. 

The first lemma is due to B. F. Kimball, [1, (5.3)]. 

Lemma 1. (Kimball) Let q be a real number such that n + q > 0, and let 
f(x) = x n+q . Then we can write A n f(x) in the form 

(3) 4"/M - /"'(* + W [l +| W(n ,«)], 

where the value of W(m, n) is given by 

(4) W(m, n) - 


B7 n (x) being a so-called Bernoulli polynomial of negative order which was first 
defined by Norland [2], 

Lemma 2. Let the sum of all products of k different numbers taken from 
, n) be denoted by Sk(n ). Then we can express it in the form 


(he set (1,2, 
(5) 




where the coefficients Xi (k), X 2 (ifc), • • • satisfy the recurrence relation 
(6) (k + p)X,_i(fc) + P*X p (fc) = X p (fc + 1) 

with Xo s 0, Xi s 1 and X* + i(fc) = 0. 

Proop. Clearly, among all ^ ^ products of (k + 1) numbers out of 


(1,2, • • • ,n), there are exactly 


(V) 


products containing the greatest factor n. 


The sum of these products is therefore n • Sk(n — 1). Repeating this reasoning, 
we get 

(7) Sk+i(n) = n*Sk(n — 1) + (n — !)• Sk(n — 2) + • • • + (k + !)•$*(&). 


Evidently, (5) is true for fc = 1. Suppose now that it is true for k = fc. Then 
the right-hand side of (7) can be written as 

- ± <-i r [»+.+»(:+; +$-, ( s .)] 

- §j (-ir^Kfc + p)Vi(fc) + P-XpW]( & ” + ' j). 

The lemma thus follows by induction on k. 



ASYMPTOTIC EXPANSION 


275 


The number S*(n) may be called a Stirling number of the first kind. By the 
lemma just proved, it is easy to find 

«„)-3 (” + 2 )-(» + ') 

«»>"K”r)-K”r)+(“r) 

(8) S.(») - 105^“ + 4 ) - m(^ + 3 ) + 25(” + 2 ) - (” + ') 

S.(») - 94s(“ + 5 ) - 1260(” + 4 ) + 490 + 3 ) 

-K’D+Ct 1 )- 

We shall see that in order to compute the coefficients of fi(k), fs(k), • • • , it is 
sufficient to compute the values of W(m, n ), Xi(#i), \ 2 (m), • • • , (m = 1,2, • • • , t). 
Let f{x) = x n+k . Then by lemma 1, we have 

+w ]_{‘ + £ ">]• 

From the definition of Sk(n) it is easily seen that 

(n + k)(n + fc — 1) • • • (n + 1) = n n + n n 1 Si(k) + * * * 4- nSk-i(k) + St(k) 


Hence we may write 


i r f-fy +wi - (.+**++... +m 

n!Ldx n o 2 k -k\\ n n 2 n* / 


It is clear from Kimball’s paper [1] that 




n) + 0(n-‘-‘). 


Substituting, we obtain 


"£*l[ 1 + £ (i 2m) W(m ’ n) + 0(n_ ‘' 1) ] 

■\l+i^+0(.n-')] 

L mTi n m J 


■[ 1 +£,g(»t p p)F^r- +0< ”'"‘ ) ]- 



276 


L. C. HSU 


The last expression shows that the asymptotic expansion (2) can be obtained 
by computing the numbers X p (m), W(m, n) with 1 < p < m < t. For example, 
consider the case t = 3 and notice that [1, (2.13)] 


m n 





*•(-£)-B4 +0(A 


and that Xi m 1, X 2 (2) = 3, Xj(3) = 10, X 3 (3) = 15. Then by a straightforward 
calculation of the right-hand side of (9) and by comparison with (2), we find 


( 10 ) 


Mk) = i(2 k 2 + k) 
m = 1 (4fc 4 - k 2 - 3 k) 

Mk) = * (40fc' - 60fc 5 - 2k* - mk' + mk 2 - 48fc). 


Finally, combining (2) with the well-known Stirling’s formula [3] 


( 11 ) 


n! 



139 

5l840n a 


+ 0(ri 


•>]■ 


and noting (1), we obtain 


(12.1) A" 0" + * 


V2wn ( n 2 V /»V + gi(k) _j_ gi(k) + g 3 (fc ) 

k\ \2/ \<y L n w 2 n 3 


+ 0 (» 4 ) 




where giilc), fifoW are polynomials in k> viz. 


f/i« = ^ (8fr 2 + W- + 1). 

Mk) = 2 g 8 (Mk* - ML + 1). 

( 12 . 2 ) 

0»(k) = .‘-,25(50/, 6 - 3840/- 6 + 832// - 4032// 
ol840 

+ 8392AT - 3732 k - 139). 

The asymptotic formula of A n 0 nf * just derived is much better than a result 
previously obtained [4], Moreover, it may be noted that the asymptotic ex¬ 
pansion of <S„, n +n may be made as sharp as desired, since in fact, for any pre¬ 
scribed i > 1, \ p (m) and ( —£n), (1 < m < /), may be easily computed by 
(6) and Kimball’s [1, (2.12)] respectively. 


REFERENCES 

[1] B. F. Kimball, “The application of Bernoulli polynomials of negative order to dif¬ 
ferencing,” Amer. Math. Jour , Vol. 56 (1933), pp 399-416. 

[21 N. E. NOrluno, Differenzenrechnung, Julius Springer, 1924, p, 138 



AN INEQUALITY FOR KURTOSIS 


277 


[3] E. T. Whittaker and G. N. Watson, Modern Analysis, Cambridge Univ. Press, p. 263. 

[4] L. C. Hsu, ‘‘Some combinatorial formulas with applications to mathematical expecta¬ 

tions and to differences of zero,** Annals of Math. Stat ., Vol. 16 (1944). 


AN INEQUALITY FOR KURTOSIS 

By Louis Guttman 
Cornell University 

1. Summary. It is well known that, if the fourth moment about the mean 
of a frequency distribution equals the square of the variance, then the frequencies 
are piled up at exactly two points, namely, the two points that are one standard 
deviation away from the mean. In this paper is developed a general inequality 
which describes the piling up of frequency around these two points for the case 
where the fourth moment exceeds the square of the variance. In a sense, it is 
shown how r “U-shaped” a distribution must be according to its second and fourth 
moments. 

2. An inequality. Let x be a random variable whose distribution has the 
following moments: 

(1) M = £(*); <r 2 = E(X - m) 2 ; (a 2 + l)* 4 = E(x - n)*. 

a is non-negative for any distribution, and its positive square root will be denoted 
by a. Let 

(2) t = (x - 

It will be shown that, if X is an arbitrary positive number, then 

(3) Prob {l - \a £ t 2 £ 1 + \a\ > 1 - X** 2 . 

If X is chosen so as to make the left member in the braces positive, then t 2 is 
bounded aw r ay from zero, and (3) becomes: 

(4) Prob { vT=“xS 2 111 £ vT+ X«} > 1 - X' 2 , (\a < 1). 

For example, if a = .5 and X = y/2, then (4) shows that the probability is 
greater than .50 that t is either between .54 and 1.30, or between —1.30 and — .54. 
If a = .2 and X = 3, then (4) shows that the probability is greater than .88 that 
t is either between .63 and 1.27, or between —1.27 and —.63. In general, the 
smaller a is, the greater the probability that t is in a small interval around +1 or 
— 1. In particular, if a = 0, then X may be taken arbitrarily large, so that (4) 
shows that the probability is unity that t = ±1; this is the well known case 
referred to above. 


278 


LOUIS GUTTMAN 


3. Derivation, Inequality (3) is a special case of a slightly more general 
inequality which follows very simply from that of Tchebychef. Consider the 
function t* — 1 + c, where c is an arbitrary real number. By using (1) and (2), 
it is seen that 

(5) E(t 2 - 1 + c) 2 - a 2 + c\ 

Then, according to Tchebychef J s inequality, if X is an arbitrary positive number, 

(6) Prob {(* 2 - 1 + c) 2 £ XV + c 2 )| > 1 - X~ 2 , 
or, 

(7) Prob {1 — c — \y/a 2 + c 2 ^ £ 2 ^ 1 — c + X\/ a 2 + c 2 } >1 — X *. 

This is the general inequality that was to be shown. 

Inequality (3) is obtained by setting c = 0 in (7). 

Another special case is obtained by determining c so as to maximize the left 
member in the braces of (7). By differentiation, the maximizing value is found 
to be c = — a/\/x 2 — 1, for which (7) becomes: 

(8) Prob {1 - a0 g t 1 <. 1 + a(f + 2)/0| > 1 - 1/(0 2 + 1), 

where 0 is used instead of the notation -\/x s — 1, and denotes any positive num- 
ber. For the same probability on the right, (8) has the advantage over (3) of 
having 1 — a0 greater than 1 — Xa, so that the former may be positive even 
though the latter is negative. Inequality (8) starts the positive interval for t as 
close to +1 as possible. On the hand, (3) provides the minimum size interval 
for t 2 from among all values of c that make the left member in the braces of (7) 
positive. 

If it is desired to have the positive interval for t end as close to +1 as possible, 
then the right member in the braces of (7) is to be minimized. By differentia¬ 
tion, the minimizing value is found to be c = a/VX 2 — 1, and the minimum in¬ 
equality is: 

(9) Prob (1 - a(0 ! + 2)/0 g ^ 1 + <*0} > 1 - 1/(0* + 1). 

4, Distribution Around ju. If the left member in the braces of (7) is negative, 
then instead of giving information about the piling up of probability of t around 
+ 1 and — 1, (7) provides a statement about the probability in an interval around 
fi. Alternatively, this may be regarded as a confidence interval for /x. The 
minimum interval is given by (9); actually, it holds regardless of the value of the' 
left member in the braces; another way of stating it is: 


dp) 


Prob { — y/l + af$ ^ t ^ y/\ + ap} > 1 — 1/08* + 1). 



TABLE FOR ESTIMATING GOODNESS OF FIT 


279 


TABLE FOR ESTIMATING THE GOODNESS OF FIT OF EMPIRICAL 

DISTRIBUTIONS 

By N. Smirnov 

1. Editorial Note. The table presented on pp. 280-281 was originally pub¬ 
lished in [1]. It gives values of 

L(z) = 1 - 2E(-1^ I ^’ !,, = (2* 

p—1 1 

which is also derived in [2]. 

Let ( Xi , • • • , X n ) be a sample of independent variables with the same con¬ 
tinuous cumulative distribution function F(x), and let N(z) be the number of 
Xk ^hich are < z. 'Ey empirical distribution is meant the step-function 
F*(z) = N(z)/n. The maximum D n of the difference | F^{z) — F(z) | is a 
random variable and L(z) is the limiting cumulative distribution function of 
n 1/2 D« . If D m ,n is the maximum of the difference | FZ(z) — F**(z) | between 
the empirical distributions of two independent samples of sizes m and n, respec¬ 
tively, then L(z) is also the limiting cumulative distribution function of 
(mn/(m + n)) 1,2 D m , n . 


REFERENCES 

[1] N. Smirnov, “On the estimation of the discrepancy between empirical curves of dis¬ 

tribution for two independent samples,” Bulletin Mathtmatique de I’Universitt 
de Mo8Cou, Vol. 2 (1939), fasc. 2. 

[2] W. Feller, “On the Kolmogorov-Smirnov limit theorems for empirical distributions,” 

Annals of Math. Slat. f Vol. 19 (1948), pp. 177-189. 



280 


N. SMIRNOV 


TABLE of L(z) I 

TABLE of L{s)- 

— 

TABLE of L{z)~ 

- 

z 

Liz) 

Continued 


Continued 




— 

z 

Liz) 

Z 

L(z) 

oe 

non 

001 1 







. «o 

• UUU 







.29 

.000 

004 

.69 

.272 

189 

1.09 

.814 

342 

.30 

.000 

009 ! 

.70 

.288 

765 

1.10 

.822 

282 

.31 

.000 

021 

.71 

.305 

471 

1.11 

.829 

950 

.32 

.000 

046 

.72 

.322 

265 

1.12 

.837 

356 

.33 

.000 

091 

.73 

.339 

113 

1.13 

.844 

502 

.34 

.000 

171 

.74 

.355 

981 

1.14 

.851 

394 

.35 

.000 

303 

.75 

.372 

833 

1.15 

.858 

038 

.36 

.000 

511 

.76 

.389 

640 

1.16 

.864 

442 

.37 

.000 

826 

.77 

.406 

372 

1.17 

.870 

612 

.38 

.001 

285 

.78 

.423 

002 

1.18 

.876 

548 

.39 

.001 

929 

.79 

.439 

505 

1.19 

.882 

258 

.40 

.002 

808 

.80 

.455 

857 

1.20 

.887 

750 

.41 

.003 

972 

.81 

.472 

041 

1.21 

.893 

030 

.42 

.005 

476 

.82 

.488 

030 

1.22 

.898 

104 

.43 

.007 

377 

.83 

.503 

808 

1.23 

.902 

972 

.44 

.009 

730 

.84 

.519 

366 

1.24 

.907 

648 

.45 

.012 

590 

.85 

.534 

682 

1.25 

.912 

132 

.46 

.016 

005 

.86 

.549 

744 

1.26 

.916 

432 

.47 

.020 

022 

.87 

.564 

546 | 

1.27 

.920 

556 

.48 

.024 

682 

.88 

.579 

070 

1.28 

.924 

505 

.49 

.030 

017 

.89 

.593 

316 

1.29 

.928 

288 

.50 

.036 

055 

.90 

.607 

270 

1.30 

.931 

908 

.51 

.042 

814 

.91 

.620 

928 

1.31 

.935 

370 

.52 

.050 

306 

.92 

.634 

286 

1.32 

.938 

682 

.53 

.058 

534 

.93 

.647 

338 I 

1.33 

.941 

848 

.54 

.067 

497 

.94 

.660 

082 

1.34 

.944 

872 

.55 

.077 

183 

.95 

.672 

516 

1.35 

.947 

756 

.56 

.087 

577 

.96 

.684 

636 

1.36 

.950 

512 

.57 

.098 

656 

.97 

.696 

444 

1.37 

.953 

142 

.58 

.110 

395 

.98 

.707 

940 

1.38 

.955 

650 

.59 

.122 

760 

.99 

.719 

126 

1.39 

.958 

040 

.60 

.135 

718 

1.00 

.730 

000 

1.40 

.960 

318 

.61 

.149 

229 

1.01 

.740 

566 

1.41 

.962 

486 

.62 

.163 

225 

1.02 

.750 

826 

1.42 

.964 

552 

.63- 

.177 

753 

1.03 

.760 

780 

1.43 

.966 

516 

.64 

.192 

677 

1.04 

.770 

434 

1.44 

.968 

382 

.65 

.207 

987 

j 1.05 

.779 

794 

1.45 

.970 

158 

.66 

.223 

637 

| 1.06 

.788 

860 

1.46 

.971 

846 

.67 

.239 

582 

1.07 

.797 

636 

1.47 

.973 

448 

.68 

.255 

780 

1 1.08 

.806 

128 

1.48 

.974 

970 




TABLE FOR ESTIMATING GOODNESS OF FIT 


281 


FABLE of Lfe)- 
lontinued 

— 

TABLE of L(a)— 
Continued 

TABLE of L{z)~ 
Concluded 

— 

Z 

Hz) 

Z 

£(*) 

Z 

£(*) 

1.49 

.976 

412 

1.89 

.998 

421 

2.29 

.999 

944 

1.50 

.977 

782 1 

1.90 

.998 

536 

2.30 

.999 

949 

1.51 

.979 

080 

1.91 

.998 

644 

2.31 

.999 

954 

1.52 

.980 

310 

1.92 

.998 

744 

2.32 

.999 

958 

1.53 

.981 

476 

1.93 

.998 

837 

2.33 

.999 

962 

1.54 

.982 

578 

1.94 

.998 

924 

2.34 

.999 

965 

1.55 

.983 

622 

1.95 

.999 

004 

2.35 

.999 

968 

1.56 

.984 

610 

1.96 

.999 

079 

2.36 

.999 

970 

1.57 

.985 

544 

1.97 

.999 

149 

2.37 

.999 

973 

1.58 

.986 

426 

1.98 

.999 

213 

2.38 

.999 

976 

1.59 

.987 

260 

1.99 

.999 

273 

2.39 

.999 

978 

1.60 

.988 

048 

2.00 

.999 

329 

2.40 

.999 

980 

1.61 

.988 

791 

2.01 

.999 

380 

2.41 

.999 

982 

1.62 

.989 

492 

2.02 

.999 

428 

2.42 

.999 

984 

1.63 

.990 

154 

2.03 

.999 

474 

2.43 

.999 

986 

1.64 

.990 

777 

2.04 

.999 

516 

2.44 

.999 

987 

1.65 

.991 

364 

2.05 

.999 

552 

2.45 

.999 

988 

1.66 

.991 

917 

2.06 

.999 

588 

2.46 

.999 

989 

1.67 

.992 

438 

2.07 

.999 

620 

2.47 

.999 

990 

1.68 

.992 

928 

2.08 

.999 

650 

2.48 

.999 

991 

1.69 

.993 

389 

| 2.09 

.999 

680 

2.49 

.999 

992 

1.70 

.993 

823 

1 2.10 

.999 

705 

2.50 

.999 

9925 

1.71 

.994 

230 

2.11 

.999 

728 

2.55 

.999 

9956 

1.72 

.994 

612 I 

1 2.12 

.999 

750 

2.60 

.999 

9974 

1.73 

.994 

972 

I 2.13 

.999 

770 

2.65 

.999 

9984 

1.74 

.995 

309 i 

i 2.14 

.999 

790 

2.70 

.999 

9990 

1.75 

.995 

625 

, 2.15 

.999 

806 | 

2.75 

.999 

9994 

1.76 

.995 

922 , 

2.16 

.999 

822 

2.80 

.999 

9997 

1.77 

.996 

200 

2.17 

.999 

838 

2.85 

.999 

99982 

1.78 

.996 

460 

2.18 

.999 

852 

2.90 

.999 

99990 

1.79 

.996 

704 

2.19 

.999 

864 

2.95 

.999 

99994 

1.80 

,996 

932 

2.20 

.999 

874 

3.00 

.999 

99997 

1.81 

.997 

146 

2.21 

.999 

886 




1.82 

.997 

346 

2.22 

.999 

896 




1.83 

.997 

533 

2.23 

.999 

904 




1.84 

.997 

707 

2.24 

.999 

912 i 




1.85 

.997 

870 

2.25 

.999 

920 




1.86 

.998 

023 

2.26 

.999 

926 




1.87 

.998 

145 

2.27 

.999 

934 




1.88 

.998 

297 

2.28 

.999 

940 







BOOK REVIEW 

Fundamentals of Statistics Truman Lee Kelley . Harvard University Press, 

1947; pp. xvi, 755. $10.00. 

Reviewed by A. M. Mood 
Iowa Slate College 

First, a brief look at the contents: introductory matter, broad classifications 
of types of data, quantitative and qualitative aspects of data, construction of 
tables, charts, and graphs—200 pages; location and scale parameters, and 
moments—75 pages; normal distribution—30 pages; exact sampling distributions 
based on normal theory—5 pages; binomial distribution, goodness of fit tests, 
contingency tables, normal approximation to the distribution of the variance 
ratio, properties of Chi-square—20 pages; correlation and regression—150 pages. 

These first 480 pages constitute the essential part of the book and the part that 
will be commented on here. But there are 270 more pages, the content of which 
we shall merely note without comment. There is a chapter of 90 pages entitled 
“Sundry Statistical Issues and Procedures” which discusses fifteen issues such as 
periodicity, time series, curve fitting, variance error of a coefficient corrected for 
attenuation, machine extraction of square roots, and sequential analysis. There 
follows a chapter of 40 pages devoted to no less than twenty-three topics in 
mathematics, topics such as: matrices and determinants, the square root trans¬ 
formation, expanding a table, spaces of three or more dimensions, and Fourier 
series. The remaining 140 pages contain numerical tables, references, various 
indexes, and a test designed to measure the adequacy of students’ mathematical 
preparation. 

This then is another book which deals with the descriptive aspects of statistics. 
Despite its title, it omits discussion of distribution theory, sampling theory, 
the theory of estimation, tests of hypotheses, or the theory of probability. The 
phrase “confidence interval” appears not once, I believe, in the entire 750 pages. 
The discussion of Student’s distribution is brief enough to be quoted in its 
entirety (page 284): “The t-distribution, shown through the courtesy of Dr. 
Philip J. Rulon, in Chart VIIIII, is appropriate for interpreting the significance 
of means, differences of means, and of regression coefficients, for small samples— 
say N less than 15. It is the distribution of these statistics computed from small 
samples drawn from a parent normal distribution.” 

Thus the author denies any value to the developments in the fundamentals of 
statistics during the past twenty-five or thirty years. He does this not merely 
by implication but in so many words; referring to modern statistical inference, 
he writes (page 13): “A still greater weakness is that it is essentially a deductive 
procedure and relatively sterile in suggesting new courses—in inspiring creative 
inferences. It is fundamentally a method of proof and not one of invention.” 

282 



BOOK REVIEW 


He is therefore fully aware of his extreme position, and takes great pains to 
justify it. His thesis is that the main purpose of statistics is to suggest new 
hypotheses to the scientist. In developing this thesis he writes (page 15): “The 
physicist observes seemingly irregular changes in x as y changes. He repeats 
his experiment, controlling more and more of the conditions, and repeats again 
and again, and, if successful, he reaches a law at the end of his work. He has 
been using statistics.” But his discussion avoids certain relevant questions. 
Why does the physicist repeat the experiment? Why did he perform it in the 
first place? Did he suspect before he collected any data that x and y might be 
related? 

At any rate, the opinion of most present-day statisticians is that the primary 
role of statistics in scientific research is statistical inference. This opinion is 
certainly well-founded in my own experience. Here at Iowa State College the 
Statistical Laboratory is intimately implicated in the research programs of all 
departments—physical, biological, and social. These scientists perform their 
experiments with a specific purpose in mind—usually the estimation of some 
parameters, sometimes the testing of a hypothesis. They never seem to seek 
in a collection of data some new hypothesis by artful selection between the mean, 
the mode, the geometric mean, the harmonic mean, and the median. 

It must be reported that, even as a book on descriptive statistics, it leaves 
much to be desired. The errors usually found in such books are to be found here 
as well as many more. There is the long discussion of skewness and kurtosis 
based on the false notion that moments are determined by the nature of the 
distribution in the neighborhood of the mean. Certain properties of the normal 
distribution are imputed to all distributions. Erroneous criteria for selecting 
amongst the many means are given. The universality of the normal distribution 
seems exaggerated; thus, for example, referring to deviations from regressions 
(page 3G4): “Since the quantities (a,’ 0 — ^o) are ‘errors’ we may regularly assume 
them to be normally distributed.”. Population parameters and their estimates 
are confused. The book contains a great many statements (like the final one in 
the section on the Student distribution quoted above) which are so carelessly 
written that they have to be counted as errors. Several of the derivations and 
arguments are also carelessly constructed; an extreme example of this appears on 
page 206: “Is the mean an unbiased statistic? M = (x a + Xb + x e + • • • + 
x n )/ N . Since the various z’s are independent, there are just N degrees of freedom 
and M is unbiased.” 

Students will likely have difficulty with this book. There is an air of arti¬ 
ficiality because of the omission of any discussion of population distributions and 
the notion of random sampling. Without any background of this kind it is 
hard to motivate the presentation, and the various topics become isolated. 
Moments are defined in terms of sample observations, and population moments 
are defined merely as the limits of these moments as the sample size becomes 
infinite. To introduce the mean, the author writes essentially: let us consider 
the function /(&) = [2x b /N] l,b . There is no pointing to the middle of a distribu- 



284 


BOOK REVIEW 


tion function, or even a sample, or a histogram. The variance is introduced the 
same way; one considers the function 2 | x t — x t \ m /(N 2 — N). Technical terms 
are used without definition; for example, in the passage about the mean quoted 
above, the student suddenly encounters the word “unbiased” without definition 
or previous discussion and must infer its meaning from the context. 

Perhaps the best part of the book are three chapters on correlation and regres¬ 
sion. The idea of correlation is here introduced with the discussion of a numerical 
example, and several other topics are discussed in terms of examples. This part 
of the book is very exhaustive; every sort of correlation coefficient is discussed 
as is every sort of correction to such coefficients. But still the writing is careless, 
and there is some confusion of ideas. The worst confusion occurs because the 
distinction between normal and intraclass correlation is never brought out; the 
discussion hops back and forth between the two ideas with no hint that they 
are not the same thing. This part of the book, too, is in the style of statistics of 
thirty years ago; the emphasis is on correlation coefficients rather than regression 
coefficients. 



NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Leo A. Aroian of Hunter College has been promoted to an assistant 
professorship. 

Mr. Carl A. Bennett is now with the General Electric Co., Hanford Engineering 
Project, Richland, Washington, as an engineer in the Statistical Division. 

Dr. Arthur B. Brown has been promoted from an Assistant Professor to an 
Associate Professor of Mathematics at Queens College, Flushing, New York. 

Professor Maurice II. Belz has returned to the University of Melbourne, 
Carlton, Australia after having spent six months in the United States. 

Dr. Edward E. Cureton, member of Richardson, Bellows, Henry & Co., Inc., 
industrial psychologists, is now at the United States Naval Air Station, Pensa¬ 
cola, Florida working on a project with the Navy. The object of this project is 
to improve ground school training, especially instructor training, in the Naval 
Air Training Command. 

Mr. Eric F. Gardner has accepted an assistant professorship at the School of 
Education, Syracuse University, Syracuse, New York. 

Mr. Lee S. Gunlogson, formerly with the Lumbermens Mutual Casualty Co. 
at Chicago, is now with the Marketing Services Division, Carrier Corpoiation, 
Syracuse, New York. 

Dr. Theodore E. Harris has accepted a position with the Douglas Aircraft Co., 
Santa Monica, California. 

Dr. Manuel O. Ilizon, a former graduate student in the Mathematics Depart¬ 
ment, University of Michigan, is now with the Bureau of Banking, Manila, 
Philippines as Actuary-Examiner. 

Mr. Julius Lieblein, formerly in the Treasury Department, Washington, 1). C., 
has transferred to the Statistical Engineering Laboratory, National Bureau of 
Standards, where he is working on problems in acceptance sampling and process 
control. 

Mr. Jack Moshman, formerly a tutor of mathematics at Queens College, 
Flushing, New York, has been appointed to the staff of the Department of 
Mathematics, University of Tennessee. 

Dr. Horace W. Norton, formerly with the U.S. Weather Bureau, Washington, 
D. C. as meteorologist, is now at Oak Ridge, Tennessee. His position there is 
to study the application of statistics to reliability of weighings and analyses in 
connection with accountability for source and fissionable materials. 

Mr. Emil D. Schell of the Bureau of Labor Statistics has been appointed Chief 
of the Mathematics and Electronic Computor Branch in the Office of the Comp¬ 
troller, United States Air Forces. 

Miss Bernice Scherl, formerly with the Schenley Research Institute, Inc., New 
York, has accepted a position as Statistician, Shell Oil Co., New York. 

285 



286 


NEWS AND NOTICES 


Dr. Irving E. Segal, who has been an assistant at the Institute for Advanced 
Study at Princeton, New Jersey, has accepted an assistant professorship in the 
Mathematics Department, University of Chicago. 

Miss Rosedith Sitgreaves, assistant statistician in the United States Public 
Health Service, has returned to her position in Washington after doing advanced 
study at Columbia University. 

Dr. John E. Walsh, who received his doctor’s degree in mathematics from 
Princeton University last October, is now employed by Douglas Aircraft Co., 
Inc. of Santa Monica, California. 

Mr. Winfred P. Wilson, a former graduate student at the University of Michi¬ 
gan, has accepted an assistant professorship at the University of Houston, 
Houston, Texas in ihe Department of Mathematics. 


Announcement has been received of a new journal, The British Journal of 
Psychology , Statistical Section , which is published by the Council of the British 
Psychological Society. The editors are Professor Sir Cyril Burt and Professor 
Godfrey Thomson. The first issue has been published and later issues will be 
published as material warrants. Subscriptions and inquiries should be sent to 
the University of London Press, Ltd., Warwick Square, London, E. C. 4. 

Announcement of Navy Department Joint Board of U. S. Civil Service 

Examiners 

Implementing its scientific research and development program both geo¬ 
graphically and in new fields of endeavor, the Navy Department is currently 
expanding three comparatively new, permanent laboratories in California. 
Heretofore, the Navy Department’s scientific centers have been concentrated in 
the eastern and eastern seaboard areas. 

Two of the laboratories have been established as the logical outgrowth of 
programs carried on by universities during the war. The Naval Ordnance Test 
Station, China Lake (formerly Inyokern), California, 160 miles from Los Angeles, 
was originally an activity of the California Institute of Technology. Its present 
program involves research, development and test work with ordnance equipment 
and explosives. The Navy Electronics Laboratory at San Diego, California is 
the outgrowth of work done by the University of California. It is concerned 
with research, testing and development of electronic control devices, detection 
equipment, instrumentation equipment and training aids. The Naval Air 
Missile Test Center at Point Mugu oh the coast of California, 60 miles north of 
Los Angela, was established when the need for an installation became apparent 
as the result of the Navy Department’s activities on guided missiles. The Test 
Center’s activities are concerned with flight and laboratory testing and evaluation 
of guided missiles and their components. 

Each of the establishments has current need for qualified personnel in a variety 
of scientific fields to stall its laboratories. Recently completed at the Naval 



NEWS AND NOTICES 


287 


Ordnance Test Station is Michelson Laboratory at a cost of $6,000,000. Many 
more millions of dollars have been spent in equipment and facilities. Additional 
construction and facilities are planned for both the Air Missile Test Center and 
the Electronics Laboratory. 

The work programs of the laboratories are planned, directed and accomplished 
under the direction of an outstanding staff of civilian scientists. Extensive use 
is made of the council method of operation. Constant liasion is maintained with 
other research organizations, universities, scientific associations, and outstanding 
authorities throughout the nation. 

Professional positions are in the career service of the Federal government under 
Civil Service laws. Examinations are now open in the three scientific establish¬ 
ments in the following professional fields: Chemist, Mathematician, Metallurgist, 
Meteorologist, Physicist, Statistician, Scientific Research Administrator and 
Scientific Staff Assistant. 

Examinations are also open in the following branches of the Engineering 
profession: Aeronautical, Chemical, Civil, Electrical, Electronics, General, 
Industrial, Material, Mechanical, Metallurgical, Ordnance, Safety and Structural. 

Salaries for most of the positions range from $3397 to $9975 per annum. 
Salaries are predicated on the level of ability, knowledge and experience required 
to effectively discharge the duties of a specific position. 

Further information may be obtained from the Navy Department Joint Board 
of U. S. Civil Service Examiners, 1030 East Green Street, Pasadena 1, California. 


Reorganization of Philosophy of Science Association 

The Philosophy of Science Association has been reorganized with Philipp Frank 
of Harvard University as President; C. West Churchman of Wayne University, 
Detroit, as Secretary-Treasurer. 

The following are members of the Governing Committee: Gustav Bergmann, 
State University of Iowa; Thomas A. Cowan, Wayne University; Clyde Kluck- 
hohn, Harvard University; Sebastian Littauer, Columbia University; F. S. C. 
Northrop, Yale Yniversity. 

The official journal of the Association is the Philosophy of Science of which 
Professor C. West Churchman is Acting Editor. Manuscripts should be sent to 
the Acting Editor. 

Applications for membership may be sent to the Secretary-Treasurer. Dues 
are $5.00 a year. 

The Association encourages the establishment of local groups in the philosophy 
of science. 

Columbia University Conference on Industrial Experimentation. 

The School of Engineering of Columbia University in the City of New York 
announces an Intersession Five-day Intensive Training Conference on Industrial 



NEWS AND NOTICES 


Experimentation to be offered September 14-18,1948 by the Department of In¬ 
dustrial Engineering in cooperation with the Department of Mathematical Sta¬ 
tistics of the Graduate Faculty of Political Science. 

The lecturing will be shared by Professors S. B. Littauer and J. Wolfwitz and 
a staff of special lecturers drawn from industry. 

A descriptive brochure will be ready for mailing in the latter part of July. 
For further details, interested persons may communicate directly with Professor 
S. B. Littauer, Department of Industrial Engineering, Columbia University, New 
York 27, New York. 


New Members 

The following persons have been elected to membership in the Institute 
(December 1, 1947 to February 28, 1948) 

Angulo, Walter J. f B.E. (Johns Hopkins Univ.) Graduate student at Johns Hopkins Uni¬ 
versity, 6229 Beaufort Ave., Baltimore 15, Maryland. 

Beard, Helen P„ Ph.D. (Mass. Institute of Tech.) Assistant Professor of Mathematics, 
Newcomb College, New Orleans 18, Louisiana. 

Blomquist, Nils G., (Univ. of Stockholm) Statistician, Sverige Reinsurance Company, 
Aladdinsvagen 47, Smedslatten, Sweden. 

Bodwell, Charles A., M.S. (Univ. of Michigan) Graduate student at the University of 
Michigan, Box 773, West Lodge, Ypsilanti, Michigan. 

Burnett, Jean, M.S. (Mich. State College) Instructor in Mathematics, Michigan State 
College, 702 Cherry Lane , East Lansing , Michigan. 

Burton, Robert E., Student at Michigan University, 1239 Atkinson Avenue , Detroit 2, 
Michigan. 

Byrd, Paul F., M.S. (Univ. of Chicago) Meteorologist, U.S.A.F., Weather Detachment, 
Lockbourne Air Base, Columbus 17, Ohio. 

Cernuschi, Felix, Ph.D. (Univ. of Cambridge) Professor at the University of Montevideo, 
Asociacion Uruguaya de Estadistica, Av. Agraciada 1464, Montevideo, Uruguay. 

Connor, William S., Jr., M.A. (Univ. of North Carolina) Associate Professor of Economics, 
University of Kentucky, College of Commerce, Lexington, Kentucky. 

Dalenius, Tore, Fil. kand. Hastholmsvagen 16, Stockholm, Sweden. 

Davis, Roderic C., M.S. (Calif. Institute of Tech.) P-6 Mathematician, Head of Assessment 
Section, P.O. Box N-467, N.O.T.S., Inyokern, California. 

Dvoretzky, Aryeh, Ph.D. (Hebrew Univ., Jerusalem) Research Fellow at Hebrew Uni¬ 
versity, Jerusalem, % American Friends Hebrew University, 9 East 89th St., New York. 

Gardner, Robert S., M.S. (Tulane Univ.) Instructor in Statistics, Mathematics Department 
at Ohio State University, 215 W. Eleventh Avenue, Columbus, Ohio. 

Goins, Mary, M.S. (Univ. of Mich.) Assistant Professor of Mathematics, Marshall College, 
436 Ninth Avenue, Huntington, West Virginia. 

Hratz, Joseph A., B.A. (St. Ambrose College) Instructor of Mathematics, St. Ambrose 
College, Davenport, Iowa. 

Kelly, Harriet J., Ph.D. (Univ. of Iowa) Research Associate, Head of Statistical Dept. 
Children’s Fund of Michigan, 660 Frederick, Detroit 2, Michigan. 

Kincaid, Wilfred M., Ph.D. (Brown Univ.) Instructor in Mathematics, University of 
Michigan, Ann Arbor, Michigan. 

Kish, Leslie, B.S. (College of the City of N. Y.) Senior Sampling Statistician, 701 Mt. Pleas¬ 
ant, Ann Arbor, Michigan. 



NEWS AND NOTICES 


289 


Lehr, Marguerite, Ph.D. (Bryn Mawr, Pa.) Associate Professor of Mathematics, Bryn Maw r 
College, Cartref, Bryn Mawr, Pennsylvania. 

Loizelier, Enrique Blanco, M.A. (Madrid Uuiv.) Professor of Statistics, Faculty of Eco¬ 
nomics, Madrid, University, Nervion 4> Madrid, Spain. 

Lorenzo, Cesar M., M.A. (American Univ., Wash., D. C.) Statistician, Food and Agriculture 
Organization of the United Nations, 17S5 De Sales Street, N.W., Washington 6 , D. C. 

Lott, Fred W., Jr., M.A. (Univ. of Mich.) Teaching Fellow at the University of Michigan, 
12SS Malden Court, Willow Ran, Michigan. 

Mantel, Nathan, B.S. (City College of New York) Biostatistieian, U. S. Public Health Serv¬ 
ice, 44&1 Lily Ponds l)r., N.E., Washington 19, D. C. 

Marrian, Dixon M., A.M. (Columbia Univ.) Master at the Gilman Country School, Balti¬ 
more, Aid., 1506 Shadysidc Road, Baltimore 18, Maryland. 

Martins, Octavio Augusto L., M.A. (Columbia Univ.) Tecnico de Educacao, Department of 
National Education, Rio de Janeiro, Rua Fiqueiredo de Magalhaes 27, Apt. 702, Rio de 
Janeiro, Brazil. 

Meyer, Herbert Albert, Ph.D. (Univ. of Iowa) Associate Professor of Alathematics, 2015 
West Columbia, Gainesville, Florida. 

Nikitich, Nicholas, B.C.S. (New York Univ.) Timekeeper, 20 Featherbed Lane , New York 
52, New York. 

Oakland, Gail Barker, M.A. (Univ. of Alinnesota) .Associate Professor of Statistics, Uni¬ 
versity of Manitoba, Winnipeg, Canada. 

Olkin, Ingram, B.S. (College of City of N. Y.) Graduate Student at Columbia University, 
245 Fort Washington Avt., New York 32, New York. 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 

The thirty-third meeting of the Institute of Mathematical Statistics was held 
at Columbia University, New York City, New York on Wednesday afternoon 
and Thursday, April 14 and 15,1948. The meeting was attended by 158 persons, 
including the following 78 members of the Institute: 

M. Afzal, L. A. Aroian; R. M. Auer, W. D. Baten, R. E. Bechhofer, J. H. Bushey, J. M. 
Cameron, B. H. Camp, G. C. Campbell, S. D. Canter, Manuel Cynamon, Tore Dalenius, 
J. F. Daly, J. L. Doob, C. W. Dunnett, Aryeh Dvoretzky, Churchill Eisenhart, Benjamin 
Epstein, M. W. Eudey, D. A. Fraser, M. A. Geisler, Mary Goins, H. H. Goode, E. J. 
Gumbel, M. II. Hansen, Mina Haskind, L. H. Herback, S. M. Ikhtiar-ul-Mulk, Seymour 
Jablon, L. F. Knudsen, Jack Laderman, Howard Levene, S. B. Littauer, F. M. Lord, 
Irving Lorge, Eugene Lukacs, W. G. Madow, Sophie Marcuse, Robert Mirsky, E. B. 
Mode, D. J. Morrow, Frederick Mostcller, D. N. Nanda, P. M. Neurath, G. E. Noether, 
M. L. Norden, Ingram Olkin, P. S. Olmstead, A. L. O’Toole, Katharine Pease, E. J. Pit¬ 
man, W. A. Reynolds, J. S. Rhodes, II. E. Robbins, H. G. Romig, Ernest Rubin, Herman 
Rubin, P. J. Rulon, Frank Saidel, G. R. Seth, M. A. Schlorek, S. S. Shrikhande, Rose- 
dith Sitgreaves, Milton Sobel, Emma Spaney, F. F. Stephan, B. R. Suydam, Henry 
Teicher, J. W. Tukey, A. Wald, H. M. Walker, J. E. Walsh, S. S. Wilks, Dzung-shu Wei, 
Lionel Weiss, Jacob Wolfowitz, C. A. Wright, Mohammad Yusuf. 

The Wednesday afternoon session, Professor S. B. Littauer of Columbia Uni¬ 
versity presiding, was devoted to the following two invited addresses: 

1. Incomplete Block Designs 

Professor R. C. Bose, Calcutta University and the University of North Carolina 

2. Non-Parametric Inference 

Professor J. G. Pitman, University of Tasmania and Columbia University 

The Thursday morning session, Professor Hobart Bushey of Hunter College 
presiding, consisted of a Symposium on Scales of Measurement at which two 
invited papers: 

1. The Development of Psychological Scaling Techniques 
Professor Harold Gulliksen, Princeton University 

2. A Generalized Model for Scales 

Professor Paul Laza»sfeld, Columbia University 

were followed by prepared discussion by Professors Phillip Rulon of Harvard 
University and John Tukey of Princeton University. 

The Thursday afternoon session, Dr. Harry G. Romig of Bell Telephone 
Laboratories presiding, was devoted to the following contributed papers: 

1. Optimum Character of the Sequential Probability Ratio Test 
Professors Abraham Wald and Jacob Wolfowitz, Columbia University 

2. Multi-parameter Sequential Estimation 
Mr. G. R. Seth, Columbia University 

290 



REPORT ON NEW YORK MEETING 


291 


3. The Distribution of a Definite Quadratic Form 
Professor Herbert Robbins, University of North Carolina 

4. The Moments and Cumulants of the Product of 2, 8, or 4 Dependent Variables (. Prelim• 
inary Report) 

Professor Leo A. Aroian, Hunter College 

5. Generalization to N Dimensions of Inequalities of the Tchebychejf Type 
Professor Burton H. Camp, Wesleyan University 

6. On the Power Function of a Sign Test Formed by Using Subsamples 
Dr. John E. Walsh, Project Rand 

7. The Distribution of T a , a Multivariate Generalization of the F-test. 

Miss Dorothy J. Morrow, University of North Carolina. 

8. Approximate Confidence Points {Preliminary Report) 

Professor John Tukey, Princeton University 

At all of the sessions there was active discussion from the floor. 

On Wednesday evening members and guests had dinner at the Men’s Faculty 
Club. 


S. B. Littauer 
Assistant Secretary 



SANKHYl 

The Indian Journal of Statistics 

Edited by P. C. Mahalanobis 

Vol VIII, Part 3, 1947 

On some analogues of the amount of mfoimation and their use in statistical 
estimation A Bhattachabyya 

The existence of collectives in abstract space C F. Koss\ck 

On recursion formulae, tables and Bessel function populations associated with 
the distribution of Classical 13 2 -Statistic P K. Bosk 

On a resolvable series of balanced incomplete block designs B. C Bose 

Notes on testing of composite hypotheses S N. Boy 

On the general law of demand for raw jute T P Cuviterjke 

Miscel laneous Not es 

Annual subscription 30 rupees 
Inquiries and orders may be addressed to the 
Editor, Sankhyi, Presidency College, Calcutta, India. 


ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol . 16, No. I, January , 1948 

Page 

J. Neyman and Elizabeth L. Stem Consistent Estimates Based on Paitially 

Consistent Observations 

1 

Report of the Washington Meeting, September 6-18, 1947 

33 

Report of the Council for 1947 

112 

Treasurer’s Report 

113 

Rules for Electing Fellows 

116 

Election of Fellows, 1947 

117 

Fellows of the Econometric Society, January, 1948 

123 

Current Activities in Econometrics 

125 

Published Quarterly Subscription to Nonmembers: 

$9.00 per year 

The Econometric Sooiety is an international society for the advancement of eoonomio theory in its 

relation to statistics and mathematics. 


Subscriptions to Beonometriea and inquiries about the work of the Society and 

the procedure in 

applying for membership should be addressed to Alfred Cowles Secretary and Treasurer, The Econ- 

ometrio Sooiety, The University of Chicago, Chicago 37, Illinois, USA 





A CLASS OF STATISTICS WITH ASYMPTOTICALLY NORMAL 
DISTRIBUTION 1 

By Wassily Hoeffding 
Institute of Statistics y University of North Carolina 

1. Summary. Let X \, • • • , X n be n independent random vectors, 
X v = (X ( p l \ • • • , Xl r) ), and $(xi , • • • , x m ) a function of m(<n) vectors x v = 
(^ 1> , * * * , xl r) )- A statistic of the form U = 2"4>(X ai , * • • , X a J/n(n - 1) 
••• (n — m + 1), where the sum 1" is extended over all permutations 
(ai, • • • , a m ) of m different integers, 1 < a, < w, is called a [/-statistic. If 
Xi , • • • , X n have the same (cumulative) distribution function (d.f.) F(x) y U is an 

unbiased estimate of the population characteristic 6(F) — / ■/ $(Xi , • • • , X m ) 

dF(xi) • • • dF{x m ). 6(F) is called a regular functional of the d.f. F(x). 
Certain optimal properties of [/-statistics as unbiased estimates of regular func¬ 
tionals have been established by Halmos [9] (cf. Section 4). 

The variance of a [/-statistic as a function of the sample size n and of certain 
population characteristics is studied in Section 5. 

It is shown that if X \, • • • , X n have the same distribution and $(xi , • • , .r TO ) 
is independent of n, the d.f. of y/n(U — 9) tends to a normal d.f. as n —> oo 
under the sole condition of the existence of E$ 2 (X i, • • • , X m ). Similar results 
hold for the joint distribution of several [/-statistics (Theorems 7.1 and 7.2), 
for statistics (/' which, in a certain sense, are asymptotically equivalent to. U 
(Theorems 7.3 and 7.4), for certain functions of statistics U or U' (Theorem 7.5) 
and, under certain additional assumptions, for the case of the -Y„’s having dif¬ 
ferent distributions (Theorems 8.1 and 8.2). Results of a similar character, 
though under different assumptions, are contained in a recent paper by 
von Mises [18] (cf. Section 7). 

Examples of statistics of the form U or U' arc the moments, Fisher’s /c-statis- 
tics, Gini’s mean difference, and several rank correlation statistics such as Spear¬ 
man’s rank correlation and the difference sign correlation (cf. Section 9). 
Asymptotic power functions for the non-parametric tests of independence based 
on these rank statistics are obtained. They show that these tests are not un¬ 
biased in the limit (Section 9f). The asymptotic distribution of the coefficient 
of partial difference sign correlation which has been suggested by Kendall also 
is obtained (Section 9h). 

2. Functionals of distribution functions. Let F(x) = F(x (l) , • • • , x (r) ) be 
an r-variate d.f. If to any F belonging to a subset 9) of the set of all d.f.’s in the 
r-dimensional Euclidean space is assigned a quantity 9(F ), then 6(F) is called a 

1 Research under a contract with the Office of Naval Research for development of multi¬ 
variate statistical theory. 


293 



294 


WASSILY HOEFFDING 


functional of F, defined on 3h In this paper the word functional will always 
mean functional of a d.f. 

An infinite population may be considered as completely determined by its 
d.f., and any numerical characteristic of an infinite population with d.f. F that 
is used in statistics is a functional of F. A finite population, or saUlple, of size n 
is determined by its d.f., S(x) say, and its size n. n itself is not a fun^|ional of S 
since two samples of different size may have the same d.f. 

If S(x a \ • • , x (r) ) is the d.f. of a finite population, or a sample, consisting 
of n elements 

(2.1) Xa = (Xa \ • • • , X ( a r) ), (a = 1, • • • , n)y 

then nS(x a \ • • • , x (r) ) is the number of elements x a such that 

<rl l) <* (1) , ••• ,*< r) <* (r) . 

Since S(x a \ • • • , x (r) ) is symmetric in X\ , • • • , x n , and retains its value for a 
sample formed from the sample (2.1) by adding one or more identical samples, 
the same two properties hold true for a sample functional 8(S). Most statistics 
in current use are functions of n and of functionals of the sample d.f. 

A random sample {Xi, • • • , Xn} is a set of n independent random vectors 

(2.2) X a = (XL 11 , • • • , X l r) ), (« - 1, • • • , n). 

For any fixed values x a> , • • • , x (r \ the d.f. S(x a) , • • • , x M ) of a random sample 
is a random variable. The functional 0($), where S is the d.f. of the random 
sample, is itself a random variable, and may be called a random functional. 

A remarkable application of the theory of functionals to functionals of d.f.’s 
has been made by von Mises [18] who considers the asymptotic distributions of 
certain functionals of sample d.f.’s. (Cf. also Section 7.) 

3. Unbiased estimation and regular functionals. Consider a functional 
8 — 8(F) of the r -variate d.f. F(x) = F(x w , • • • , x {r) ), and suppose that for some 
sample size n, 8 admits an unbiased estimate for any d.f. F in $. That is, if 
Xi , • • • , X n are n independent random vectors with the same d.f. F, there exists 
a function <p(x i, • • • , x n ) of n vector arguments (2.1) such that the expected 
value of <p(Xi , • • • , X n ) is equal to 0(F), or 

(3.1) / ’ • • / ^ Xl >•••>*«) dF( Xl ) ■ ■ ■ dF(x n ) = 8(F) 

for every F in 3). Here and in the sequel, when no integration limits are indi¬ 
cated, the integral is extended over the entire space of xi , • • • , x n . The integral 
is understood in the sense of Stieltjes-Lebesgue. 

The estimate <p(x i ,*• • , x n ) of 8(F) is called unbiased over 3). 

A functional 8(F) of the form (3.1) will be referred to as regular over 3). 2 

1 This is an adaptation to functionals of d.f.’s of the term “regular functional” used by 
Vol terra [21]. 



A CLASS OF STATISTICS 


295 


Thus, the functionals regular over 2) are those admitting an unbiased estimate 
over 3). 

If 0(F) is regular over 3), let m(<n) be the smallest sample size for which there 
exists an unbiased estimate 3>(xi, • • • , x m ) of 6 over 3): 

(3.2) 6 (F) = J ■■ I '*(*» ,■■■ ,x m ) dF(xi) ■ ■ ■ dF(x m ) 

for any F in 3). Then m will be called the degree over 3) of the regular func¬ 
tional 0(F). 

If the expected value of <p(Xi , • • * , X n ) is equal to 0(F) whenever it exists, 
<p(xi , • • • , will be called & distribution-free unbiased estimate (d-f. u.e.) of 0(F). 
The degree of 0(F) over the set 3) 0 of d.f.’s F for which the right hand side of (3.1) 
exists will be simply termed the degree of 0(F). 

A regular functional of degree 1 over 3) is called a linear regular functional 
over 3). If 0(F) has the same value for all F in 3), 0(F) may be termed a regular 
functional of degree zero over 3). 

Any function <$(:ri , • * • , x m ) satisfying (3.2) will be referred to as a kernel of 
the regular functional 0(F). 

For any regular functional 0(F) there exists a kernel $ 0 (zi, * • • , x m ) symmetric 
in Xi , • • • , x m . For if ^>(xi, • • • , x m ) is a kernel of 0(F), 

(3*3) $o0zi y * * * > Xm) — j 2$(x ai , * * * , XaJ ^), 

where the sum is taken over all permutations («i, ■ •• , a m ) of (1, • • • , m), is a 
symmetric kernel of 0(F). 

If 0i(F) and 02 (F) are two regular functionals of degrees ra x and m 2 over 3), 
then the sum 0i(F) + 62 (F) and the product 0i(F)0 2 (F) are regular functionals 
of degrees <m = Max (mi, m 2 ) and <m x + m 2 , respectively, over 3). For if 
$i(xi , • • • , x mt ) is a kernel of 0,(F), (i = 1, 2), then 

0i(F) + d 2 (F) = J • • • J {$i(ji , • • • , x mi ) + $ 2 (:ri, • • • , Jm 2 )} 

dF(x 1 ) • • • dF(x m ) 
and 

0i(F)0 2 (F) = /•••/ ^ 1(^1 , * ’ ’ > > * ’ ’ > 

dF(x 1 ) • • • dF(x mi+m2 ). 

More generally, a polynomial in regular functionals is itself a regular functional . 
Examples of linear regular functionals are the moments about the origin, 

= /•••/ (• e<1) ) M • • • t^’)" . • • • > * w )- 



296 


WASSILY HOEFFDING 


A moment about the mean is a polynomial in moments n' about 0, and hence a 
regular functional over the set 3) 0 of d.f.’s for which it exists (cf. Halmos [9]). 
For instance, the variance of X a) , 


<r* = //((xi 0 ) 2 - x[ n x, (1) ) dF(xi l> ) dF(x^) 

is a regular functional of degree 2. A symmetrical kernel of a 2 is (x {l) — x {2) ) 2 /2. 
If 3) is the set of univariate d.f.’s with mean /x and existing second moment, 
a 2 is a linear regular functional of F over 3), since then we have 


* = f (xl° - m ) 2 dF(x[ n ). 


The function 


v 


n(n — 1) 2 


Z k - 4®)* = 



is a distribution-free unbiased estimate of a 2 . The function 


KVVV'Wi) 

is known to be an unbiased estimate of a over the set of univariate normal d.f.’s, 
but it is not a d.-f. u.e. 


4. C/-statistics. Let x \, • • • , x n be a sample of n vectors (2.1) and 
$(£i, • • • , x m ) a function of m(<ri) vector arguments. Consider the function 
of the sample, 

(4.1) U = Ufa , • • • , x n ) = n(n _ 1} _ m + i) > •' • > 

where 2" stands for summation over all permutations («i, • • • , a m ) of m integers 
such that 

(4.2) 1 < oti < n, a y if i ^ j , (i, j = 1, • • • , m). 

U is the average of the values of $ in the set of ordered subsets of m members 
of the sample (2.1). U is symmetric in X \, • • • , x n . 

Any statistic of the form (4.1) will be called a U-slatistic. Any function 
$(xi , • • • , x m ) satisfying (4.1) will be referred to as a kernel of the statistic U. 

If <J>(xi, • • • , x m ) is a kernel of a regular functional 6(F) defined on a set 3), 
then U is an unbiased estimate of 6(F) over 3): 

(4.3) 6(F) = f ■■■ f U( Xl ,•••,*„) dFW ■ ■ ■ dF(x„) 


for every F in 3). 



A CLASS OF STATISTICS 


297 


For n = m y U reduces to the symmetric kernel (3.3) of 6(F). 

From a recent paper by Halmos [9] it follows for the case of univariate d.f.’s 
(r = 1): 

If 9(F) is a regular functional of degree m over a set 3) containing all purely 
discontinuous d.f.’s, U is the only unbiased estimate over 3) which is symmetric 
in x\ , • • • , i n , and U has the least variance among all unbiased estimates 
over 3). 

These results and the proofs given by Halmos can easily be extended to the 
multivariate case (r > 1). 

Combining (3.3) and (4.1) we may write a (/-statistic in the form 

(4.4) U(xi, ,x n ) = (^) 2'4>o(x a , , • • • , x tt J, 

where the kernel <f> 0 is symmetric in its m vector arguments and the sum 2' is 
extended over all subscripts a such that 

1 < c*i < «2 < * * * < < n. 


Another statistic frequently used for estimating 9(F) is 9(S) y where S = S(x) 
is the d.f. of the sample (2.1). If -S is substituted for F in (3.2), we have 

(4.5) 6(S) = -t E • • • E *(*«,. • • • , *-.)• 

n m cu-i « m -i 

/ In particular, the sample moments have this form; their kernel $ is obtained 
by the method described in section 3. 

If m = 1, 00S) « U. If w = 2, 

e(S) = u + - {- E *(*- ,*«)), 

n n [n a ~i J 

and 0(S ) is a linear function of (/-statistics with coefficients depending on n. 
This is easily seen to be true for any m. In general 9(S) is not an unbiased esti¬ 
mate of 0(F). If, however, the expected value of 0(S) exists for every F in 3), 
we have 

E{d(S) ) = 6(F) + 0(rT l ), 

and the estimate 9(S) of 6(F) may be termed unbiased in the limit over 3). 

Numerous statistics in current use have the form of, or can be expressed in 
terms of (/-statistics. From what was said above about moments as regular 
functionals, it is easy to obtain U -statistics which are d.-f. u.e/s of the moments 
about the mean of any order (cf. Halmos [9]). Fisher’s ^-statistics are ^-statis¬ 
tics, as follows from their definition as unbiased estimates of the cumulants, 
symmetric in the sample values. Another example is Gini’s mean difference 


„<i) 


n(n — 1 ) 


- 4 ° 



298 


WASSILY HOEFFDING 


More examples, in particular of rank correlation statistics, will be given in 
section 9. 

5. The variance of a L T -statistic. Let Xi , • • • , X n be n independent random 
vectors with the same d.f. F(x) = F(x {1) , • • • , x (r) ), and let 

(5.1) U = TO , • • • , x„) = ' S' ,■■■, X a J, 

where <f>(:ri, • • • , x m ) is symmetric in x \, • • • , x m and £' has the same meaning 
as in (4.4). Suppose that the function <t> does not involve n. 

If 6 = 6(F) is defined by (3.2), we have 

E{U\ = E\*(X l9 ••• ,X m )\ = e. 

Let 


(5.2) *,(xx, 

• • • , Jr c ) = £(4(.n , • • • , ,r c , X r+ i 

, ••• ,x m )\, 

(c = 1, 

• • • ,m), 

where Xi, • ■ ■ 

, x c are arbitrary fixed vectors and the expected value is taken with 

respect to the random vectors X c +j , • • * , X m . 

Then 



(5.3) 

4> f -i(.ri, • • • , av-i) = E[$ c (xi , • 

> *t*c—1 > X«) 1 

7 


and 





(5.4) 

E'MXi,--- , X c )) = 

= 

(c = 1, 

■■■ , m). 

Define 





(5.5) 

^(a-i, • • • , Xm) = 4>(.rj, • • • 

, »Tm) 0, 



(5.6) 

Vc(x i ,•••,*.)= 4> f (a:i, • • • 


(c = 1, 

• • • , m). 

We have 





(5.7) 

4Vi(zi, • • • , a\-i) = ElVcix!, 

> %c —1 > Xc) } , 


(5.8) E{9 r (X lt ••• ,X.)} = £{*(X 1; ••• , 

x m )] = 0, 

(c = 1, 

,m). 

Suppose that the variance of %(Xi , • • • , X c ) exists, and let 



(5.9) 

fo = 0, f e = E^liXi, 

••• ,X.)}, 

(c = 1, 

■■■ ,m). 

We have 





(5.10) 

fc - X{*J(X, , • • • , Xc)} - e 2 . 




f c = tc(F) is a polynomial in regular functionals of F, and hence itself a regular 
functional of F (of degree < 2m). 

If, for some parent distribution F = F 0 and some integer d, we have f d (Fo) = 0, 
this means that Vd(Xi , • • • , X d ) = 0 with probability 1. By (5.7) and (5.9), 
Sd = 0 implies fi = • • • = f rf _i = 0. 



A CLASS OF STATISTICS 


299 


If fi (Fq) — 0, we shall say that the regular functional 0(F) is stationary 3 
for F = F 0 - If 

(5.11) fi(Fo) = • • • = UFo) = 0, frf+ifn) >0, (1 < d < m), 

0(F) will he called stationary of order d for F — F 0 . 

If («i , • • • , « m ) and ((3i , • • • , (3 m ) are two sets of m different integers, 1 < a t , 
< n , and c is the number of integers common to the two sets, we have, by the 
symmetry of 'V, 

(5.12) E{*(X ai , .. • , X am mx 3l , ... , x, m )i = fr. 

If the variance of U exists, it is equal to 

Au) = 

( \-2 m 

* ) 2 s (r) £!*(.¥„,, • • • , .Y.JHY* , • • • , A' S J}, 

777 / c-0 

where ~ <c) stands for summation over all subscripts such that 

1 < < « 2 < • • • < a m < n, 1 < /ii < jc ?*2 < • * • < (3m < n, 

and exactly c equations 

a, = (3j 

are satisfied. By (3.12), each term in ~ (,) is equal to f, . The number of terms 
in 2 (f) is easily seen to be 

n(n — !)••• (w — 2m + c + 1) _ /?rc\ fn — m\ //A 
c!(?n — c)!(?a — c)! \c/ \m — c) \m/ 

and hence, since fo = 0, 

»“> -(:)"§ (:)(:-:>■■ 

When the distributions of X \, • • • , X n are different, F v (x) being the d.f. of 
X v , let 

(5.14) = E\HX ai , ••• , A\J}, 

• •,a t .)3i,. • ..0 m - c (** 1 j *’* J ***e) 

(5.15) = : F{<l>(***i , * • * , Xc , ATjJj ... , Aj3 m _ c )} — ^Oj.- ••,a c ,|3i.-■ -.(5 m -e > 

(c = 1, • • • , m), 

* According to the definition of the derivative of a functional (cf. Volterra (21]; for 
functionals of d.f.’s cf. von Miscs [18]), the function m(m — 1) .. (m — d + 1) ... ay), 

which is a functional of F, is a d-th derivative of 0(F) with respect to F at the “point” F 
of the space of d.f.’s. 



300 


WASSILY HOEFFDING 


{’c(al,^^^,ae)0l.^^^.0m-c’'yl•^ * M'm-e 

(5.16) = El y &c(a l .‘--,a c )0i.---,0m-c(X a l » * * * ) ^ r «c)^ r c(a ll -*‘,a c )7i.-**,7m-c 

(*«>,-•■ ,XJ! 

c!(m — c)!(w — c)! 


(5.17) r<,« = -- 




n(n — 1) • • • (n — 2m + c + 1) 
where the sum is extended over all subscripts a , 0, y such that 
1 < ai < • • • < a c < n, 1 <Pi < ■ • • < /3 m _ f < n, l < 71 < * * * T«-c < ft, 


5* jSj , ^ 7j i $ X 7*yj. 

Then the variance of U is equal to 

<“> •’<« -G)"S(:)(:i:)f- 

Returning to the case of identically distributed X's, we shall now prove some 
inequalities satisfied by fi, • • • , £ m and cr 2 (U) which are contained in the fol¬ 
lowing theorems: 

Theorem 5.1 The quantities fi , • • • , f m as defined by (5.9) satisfy the in¬ 
equalities 

(5.19) 0 < f - < l- if 1 < c < d < m. 

c d 

Theorem 5.2 The variance a 2 (( T „) of a l'-statistic U n = U(X i, ••• , A"„), 
where X\ , • • • , -Y n are independent and identically distributed , satisfies the in¬ 
equalities 

(5.20) — fi < a\u n ) < m { m . 

n n 

na\V n ) is a decreasing function of n, 

(5.21) (n + l)<r 2 (f/ n +i) < na\U n ), 


tafccs an t/s r bound m$ m for n — m and tends to its lower bound w. 2 f ( 
as n increases: 


(5.22) <r 2 (t/ m ) = U , 

(5.23) lim n<r 2 (Un) = m 2 fi. 


// F{ f/n) = 0(F) fs stationary of order >d — 1 for the d.f. of X a , (5.20) may 
be replaced by 


(5.24) 


* A\(w, rf)Cd < Al\) < K n (m, (Or. 



A CLASS OP STATISTICS 301 

where 

<w> K -"“. * = 5 

We postpone the proofs of Theorems 5.1 and 5.2. 

(5.13) and (5.19) imply that a necessary and sufficient condition for the 
existence of <r 2 (U) is the existence of 

(5.26) U = ^(X,,- ,X m )\ 

or that of , • • • , X m )j. 

If fi > 0, a(U) is of order rf l . 

If 0(F) is stationary of order d for F = F 0 , that is, if (5.11) is satisfied, <r 2 (C/) 
is of order n”*" 1 . Only if, for some F = F 0 , 0(F) is stationary of order m, where 
m is the degree of 0(F), we have <r 2 (U) = 0, and V is equal to a constant with 
probability 1. 

For instance, if 0(F O ) = 0, the functional 0 2 (F) is stationary for F = F 0 . 
Other examples of stationary “points” of a functional will be found in section 9d. 
For proving Theorem 5.1 we shall require the following: 

Lemma 5.1. If 

(5.27) d d = fd — fd-i + ? d - 2 ' ‘1)* 1 ^ f 1 > 

we have 

(5.28) > 0, (d = 1, • • • , mV 

and 

(5.29) I’d — &d + 8d-i + ■ * * + __ ^ $i. 

Proof. (5.29) follows from (5.27) by induction. 

For proving (5.28) let 

va = 0 2 , Vc = F|$*(A\ , • • • , A\)}, (c = 1, • • • , ?w). 

Then, by (5.10), 

ic = Vc - , 

and on substituting this in (5.27) we have 

From (5.9) it is seen that (5.28) is true for d — 1. Suppose that (5.28) holds 
for 1, • • • , d — 1. Then (5.28) will be shown to hold for d. 



302 


WASSILY HOEFFDING 


Let 

$o(xi) = <J>iOi) - 6, $c(xi X e+1 ) 

= $ c +i(*i, • • • , Sc+i) - ‘t’cfe , • • • , x c+ i), (c = 1, • • • , d - 1). 

For an arbitrary fixed Xi, let 

*(*i) = Eltfjxt, X,, ■ ■ ■ , X c+l )}, (c = 0, • • • , d - 1). 


Then, by induction hypothesis, 

fc-iOci) = S (-l)"" 1 -” ^ i c (xi) > 0 

for any fixed X \. 

Now, 


E{ric(X l)} = 1?c+l - l?e, 


and hence 

^{J«(Z0} = £ 7 W -Ve) = t, i-i) d ~ c (f)vc = s„. 

c—0 \ C / c _0 \C/ 


The proof of Lemma 3.1 is complete. 

Proof of Theorem 5.1. By (5.29) we have for c < d 


Ct* - du = c i, (f) 5„ - d £ (f) S a 

a—l \fl/ a*»l \fl/ 

«« -S [•©- 4 CD! 


a—ffl \fl/ 


From (5.28), and since c\ 


CO-O 


> 0 if 1 < a < c < d, it follows that each 


term in the two sums of (5.30) is not negative. This, in connection with (5.9) 
proves Theorem 5.1. 

Proof of Theorem 5.2. From (5.19) we have 


° ri - - m fm ’ 


(c = 1, • • • , m). 

Applying these inequalities to each term in (5.13) and using the identity 


(5.31) 


( n \ 1 v> ( m \ (n — m 2 
W & C WU- cj=n’ 


we obtain (5.20). 

(5.22) and (5.23) follow immediately from (5.13). 
For (5.21) we may write 


(5.32) 


D n > 0, 



A CLASS OP STATISTICS 


303 


D n = n<r 2 (U„) - (n + l)AU n+i ). 


Dn — ^ ■ rfn.cfc- 


Then we have from (5.13) 


<“> «--•(:) -*•+»(:) 

/ft+1 — m\/ft + l\ _1 

\ m — c / \ m / ' 




Putting 


r 0 = 1 + 


'(m - l ) 2 


where [ft] denotes the largest integer < ft, we have 

d n ,r <0 if C < C 0 , 


Hence, by (5.19), 


d n ,c >0 if c > Co 


dn,c£c > £coCd n ,c i 
Co 


(1 < c < m < ft). 


(c = 1, • • • , m), 


1 W 

T)n ^ ^co X Cd n ,c‘ 

Co c-1 

By (5.33) and (5.31), the latter sum vanishes. This proves (5.32). 

For the stationary case fi = • • • = f d -i = 0, (5.24) is a direct consequence of 
(5.13) and (5.19). The proof of Theorem 5.2 is complete. 

6. The covariance of two (/-statistics. Consider a set of g (/-statistics, 




(y — * * * > q\ 



304 


WASSILY HOEFFDING 


each U {y) being a function of the same n independent, identically distributed 
random vectors X x , • • • , X n . The function <t> (Y) is assumed to be symmetric 
in its m(y) arguments (y = 1, • • • , g). 

Let 

E{U M ) - E{*™(Xi, — , X. ( „)} = e (y \ (7 = 1 , • • •, 9)1 

(6.1) ¥ y) (xi, ,x mM ) = ^ y) (xi , • • •, x mM ) - e ly \ (7 = 1, • • •, g)i 

(6.2) ,x e ) = Ef^fa, • • • , Xc, Xc+ 1 , , x m(r) )}, 

(c = 1, , m( y); y = 1, • • • , g); 

(6.3) = BW»(Xi, ■ • • , Z0*?"(X!, • • • , X c )), 

(y> & ~ !>''*» q)‘ 


If, in particular, y = 6 , we shall write 


( 6 . 4 ) f< 7) = rl y ’ y> = E{^ y, (X 1 , • • • , X c )}\ 

Let 

<t(c/ ( t) , O = £{(t/ (7) - 0 (7) )(t/ (a) - Ol 


be the covariance of U (y) and t/ (a) . 

In a similar way as for the variance, we find, if m ( 7 ) < m (5), 

w» =U))" f C?) te- ( ‘D 

The right hand side is easily seen to be symmetric in 7 , 5. 

For 7 = 5, (6.5) is the variance of U (y) (cf. (5.13)). 

We have from (5.23) and (6.5) 

lim na\U {y) ) = 

n-* 00 

lim m(U M , U w ) = m( 7 )m(a)f{ T ' n . 

n-*oo 

Hence, if ^ 0 and f{ !) ^ 0, the product moment correlation p(U (y) , U {S) ) 
between U <y> and U <s> tends to the limit 


( 6 . 6 ) 


lim p(U iy) , U w ) 




7. Limit theorems for the case of identically distributed X a ’s. We shall now 
study the asymptotic distribution of ^/-statistics and certain related functions. 
In this section the vectors X a will be assumed to be identically distributed. An 
extension to the case of different parent distributions will be given in section 8 . 

Following Cramer [2, p. 83] we shall say that a sequence of d.f.’s F x (x ), 
F 2 (x), • • • converges to a d.f. F(x) if lim F n {x) = F(x) in every point at which 
the one-dimensional marginal limiting d.f.’s are continuous. 



A CLASS or STATISTICS 


305 


Let us recall (cf. Cramer [2, p. 312]) that a ^-variate normal distribution is 
called non-singular if the rank r of its covariance matrix is equal to g, and singular 
if r <g. 

The following lemma will be used in the proofs. 

Lemma 7.1. Let Vi, Vn, • • • be an infinite sequence of random vectors V n = 

• • • , V ( / ) ), and suppose that the d.f. F n (v) of V n tends to a d.f. F(v) as 
n —> oo. Let Vn y) ' = V ( n y) + dl y) , where 

(7.1) lim E{dn y) } 2 = 0, (7 = 1, • • • , g). 

n-¥ to 

Then the d.f. of V' n = ( V ( n ly , • • • , V ( n ° y ) tends to F(v). 

This is an immediate consequence of the well-known fact that the d.f. of V' n 
tends to F(v) if dl y) converges in probability to 0 (cf. Cram6r [2, p. 299]), since 
the fulfillment of (7.1) is sufficient for the latter condition. 

Theorem 7.1. Let Xi, ■ • • , X n ben independent, identically distributed random 
vectors, 

X„= ,xl r) ), («-l, ••.,»). 

Let 

$ <7) (a*l, ••• ,X M y))> (7 * 1, ,g)> 

be g real-valued functions not involving n, 4> (7) being symmetric in its m(y) (<ri) 
vector arguments x a = (xi 1 * , • • • , t!* 0 ), (a = 1, • • • , m(y ); 7 = 1, * • * , g). 
Define 

(7.2) U <T> = J 1 32 * ly) (X ai ,■■■, ( 7=1 

where the summation is over all subscripts such that 1 < a\ < • • • < < n. 

Then, if the expected values 

(7.3) e M = Sf* M (X t , • • •, x mM )}, (7 - i, • • •, g), 

and 

(7.4) E\^ y \X 1 , X mM )}\ (7 = 1, • • • , 9 ), 

exist, the joint d.f. of 

Vn(U m - e a) ), •••, Vn(u w - ) 

tends, as n —♦ 00 , to the g-variate normal d.f. with zero means and covariance matrix 
(m(y)m(8)fi y,6) ), where fi 7,a) is defined by (6.3). The limiting distribution is 
non-singular if the determinant | | is positive. 

Before proving Theorem 7.1, a few words may be said about its meaning and 
its relation to well-known results. 

For 0 = 1 , Theorem 7.1 states that the distribution of a U -statistic tends, under 
certain conditions, to the normal form. For m = 1, U is the sum of n inde- 



306 


WASSILY HOEFFDING 


pendent random variables, and in this case Theorem 7.1 reduces to the Central 
Limit Theorem for such sums. For m > 1, U is a sum of random variables 
which, in general, are not independent. Under certain assumptions about the 
function $(xi , • • • , x m ) the asymptotic normality of U can be inferred from 
the Central Limit Theorem by well-known methods. If, for instance, $ is a 
polynomial (as in the case of the ^-statistics or the unbiased estimates of mo¬ 
ments), U can be expressed as a polynomial in moments about the origin which 
are sums of independent random variables, and for this case the tendency to 
normality of U can easily be shown (cf. Cramer [2, p. 365]). 

Theorem 7.1 generalizes these results, stating that in the case of independent 
and identically distributed X a ’s the existence of F{4> 2 (Xi, • • • , X m )} is sufficient 
for the asymptotic normality of U. No regularity conditions are imposed on the 
function 4». This point is important for some applications (cf. section 9). 

Theorem 7.1 and the following theorems of sections 7 and 8 are closely related 
to recent results of von Mises [18] which were published after this paper was 
essentially completed. It will be seen below (Theorem 7.4) that the limiting 
distribution of \/n[U — 6(F)] is the same as that of \/n[6(S) — 6(F)] (cf. (4.5)) 
if the variance of 0(S) exists. 6(S) is a differentiable statistical function in the 
sense of von Mises, and by Theorem I of [18], \/rc[0(&) — 0(F)] is asymptotically 
normal if certain conditions are satisfied. It will be found that in certain cases, 
for instance if the kernel $ of 6 is a polynomial, the conditions of the theorems of 
sections 7 and 8 are somewhat weaker than those of von Mises’ theorem. 
Though von Mises’ paper is concerned with functionals of univariate d.f.’s only, 
its results can easily be extended to the multivariate case. 

For the particular case of a discrete population (where F is a step function), 
U and 6(S) are polynomials in the sample frequencies, and their asymptotic 
distribution may be inferred from the fact that the joint distribution of the fre¬ 
quencies tends to the normal form (cf. also von Mises [18]). 

In Theorem 7.1 the functions 4> (7) (xi, • • • , x m {y)) are supposed to be sym¬ 
metric. Since, as has been seen in section 4, any 17-statistic with non-symmetric 
kernel can be written in the form (4.4) with a symmetric kernel, this restriction 
is not essential and has been made only for the sake of convenience. Moreover, 
in the condition of the existence of E{& 2 (X i, • • • , X m ) J, the symmetric kernel 
may be replaced by a non-symmetric one. For, if 4> is non-symmetric, and <£ 0 is 
the symmetric kernel defined by (3.3), F{$S(Xi, ■ • • , X m )) is a linear combina¬ 
tion of terms of the form E{$(X ai , • • • , X am ) 4> (X Pl , • • • , X fim )}, whose exist¬ 
ence follows from that of E{$ 2 (X\ , • • • , X m ) \ by Schwarz’s inequality. 

If the regular functional 0(F) is stationary for F = F 0 , that is, if fi = fi(F 0 ) = 0 
(cf. section 5), the limiting normal distribution of y/n(U — 0) is, according to 
Theorem 7.1, singular, that is, its variance is zero. As has been seen in section 
5, <r 2 (U) need not be zero in this case, but may be of some order n~\ 
(c = 2, 3, • • • , m), and the distribution of n e,2 (U — 0) may tend to a limiting 
form which is not normal. According to von Mises [18], it is a limiting dis¬ 
tribution of type c, (c = 2, 3, • • • ). 



A CLASS OF STATISTICS 


307 


According to Theorem 5.2, <r 2 (U) exceeds its asymptotic value m 2 fi/n for any 
finite n. Hence, if we apply Theorem 7.1 for approximating the distribution of 
U when n is large but finite, we underestimate the variance of U. For many 
applications this is undesirable, and for such cases the following theorem, which 
is an immediate consequence of Theorem 7.1, will be more useful. 

Theorem 7.2. Under the conditions of Theorem 7.1, and if 

ti r, >0, (y = l, ••• ,g), 

the joint d.f. of 

(u w _ e m )/a(U m ), ■■■, (u (0) - e w )/<r{u w ) 


tends , as n , to the g-variate normal d.f . with zero means and covariance matrix 

( p (y,S) )j where 


(7,8) 


= lim - 


<r(U iy \ U W ) 


r; 


(7.8) 


, <r(r<*>M </<*>) 

Proof of Theorem 7.1. The existence of (7.4) entails that of 


(y, $ = !>•••, g)- 


sl y) = E{& y xx x ,--- - ( 0 : 


,<7> 


(7)\2 


which, by (5.19), (5.20) and (6.6), is sufficient for the existence of 
• • • , ti-i, of AU <y> ), and of 
Now, consider the g quantities 

t (Y = 1 

VW a = l 

where ^ i y) (x ) is defined by (6.2). 7 (1) , • • • , Y (Q) are sums of n independent, 

random variables with zero means, whose covariance matrix, by virtue of (6.3), is 

(7.5) {<r(F (7) ,F (6) )} = 

By the Central Limit Theorem for vectors (ef. Cramer [1, p. 112]), the joint d.f. 
of (7 (1) , • • • , 7 (ff) ) tends to the normal gr-variate d.f. with the same means and 
covariances. 

Theorem 7.1 will be proved by showing that the g random variables 

(7.6) Z <7) = Vn(U (y) - 0 (7) ), (7 = 1,; ■ * , 9 ), 

have the same joint limiting distribution as F (1) , • • • , Y (o) . 

According to Lemma 7.1 it is sufficient to show that 

(7.7) lim E(Z (y) - F (7) ) 2 = 0, (7 = 1, • • • , n). 


For proving (7.7), write 

(7.8) E{Z M - F <7> )* = E[Z (y) } 2 + i5{F (T) ) 2 - 2 E{Z (y) Y (y) }. 



308 


WASSILY HOEPPDING 


By (5.13) we have 

(7.9) E\Z (y) }' = nS(U M ) - w ! ( r )f{ 7) + 0(» _1 ), 
and from (7.5), 

(7.10) £{F (T> )‘ = m\y)ri y \ 

By (7.2) and (6.1) we may write for (7.6) 

Z<7) = ^ (mmY S ' ’ • • • ’ 

and hence 

E{Z iy) Y™\ = m( 7 ) ( m ” 7) ) _I E E' S{^ T, (X„)^ (T, (X 01 , • • •, X.. (T) )). 

The term 

E{*l y \X a W y \X ai ,-- - ,X amM )} 

is = if 

(7.11) cti = a or a 2 = a • • • or a m ( 7 > = or 

and 0 otherwise. For a fixed a, the number of sets {ai, • • • , af m(7) } such that 
1 < «i < • • • < a m ( 7 ) < n and (7.11) is satisfied, is Thus, 

(7.12) - -w („^)“ » ,)fl’> - 

On inserting (7.9), (7.10), and (7.12) in (7.8), we see that (7.7) is true. 

The concluding remark in Theorem 7.1 is a direct consequence of the definition 
of a non-singular distribution. The proof of Theorem 7.1 is complete. 

Theorems 7.1 and 7.2 deal with the asymptotic distribution of f7 (1) , • • • , U (o \ 
which are unbiased estimates of 0 (1) , • • * , d ia) . The unbiasedness of a statistic 
is, of course, irrelevant for its asymptotic behavior, and the application of Lemma 
7.1 leads immediately to the following extension of Theorem 7.1 to a larger class 
of statistics. 

Theorem 7.3. Let 

2,(7) 

(7.13) U { ° y = U ig) + , (7 = 1 ,•••,?), 

vn 

where U (y) is defined by (7.2) and b { n y) is a random variable. If the conditions of 
Theorem 7.1 are satisfied , and lim E{b { n y) \ 2 = 0, (7 = 1, • • • , g), then the joint 
distribution of 

Vn(U (l)f - O, • • • , Vn(U w - 6 (0) ) 
tends to the normal distribution with zero means and covariance matrix 



A CLASS OF STATISTICS 


309 


This theorem applies, in particular, to the regular functionals 6(S) of the 
sample d.f., 

6(S) = ^ E ••• E 
n m «!-1 « w -i 

in the case that the variance of B(S) exists. For we may write 

nrm = (” ) u + s* *(x ai , • • •, x m j, 

where the sum 2 * is extended over ail ra-tuplets («i, • • • , a m ) in which at least 
one equality ai = a 3 (i ^ j) is satisfied. The number of terms in 2 * is of order 
n m ~ l . Hence 


9{S) - U =-D, 
n 

where the expected value E{D 2 j, whose existence follows from that of <r 2 { 0 (<S)}, 
is bounded for n —» <*>. Thus, if we put U = 0 (7) (£), the conditions 
of Theorem 7.3 are fulfilled. We may summarize this result as follows: 

Theorem 7.4. Let X\ , • • • , X n be a random sample from an r-variate popula¬ 
tion with d.f. F(: r) = F(x il \ • • • , x {r) ), and let 


B n) (F) = / • • • / ^(rr,, • • • , x m(T ,) rfFfe) • • • dF(r. <1,, ) > (7 = 1, • • • , ff), 


be <7 regular fuiictionals of F, where $> (y \xi , • • • , £ m(T) ) is symmetric in the vectors 
x \, • • • , Xm(y) and does not involve n. If S(x) is the d.f. of the random sample , 
and if the variance of 


e*\s) -JL± ... £ ,■■■, x. mW ) 

n 1 a m ( Y )—1 

exists y the joint d.f. of 

VTi{e w (S) - e ( "(F)U ,V^{e w (S) - 


tends to the g-variate normal d.f. with zero means and covariance matrix 

{m(7)m(S)fJ v,,> ). 

The following theorem is concerned with the asymptotic distribution of a 
function of statistics of the form U or U'. 

Theorem 7.5. Let(U') = (U il) \ • • • , U io) ') be a random vector , where U ty) ’ 
is defined by (7.13), and suppose that the conditions of Theorem 7.3 are satisfied. 
If the function h{y) — h(y a \ • • • , y {0) ) does not involve n and is continuous together 
with its second order partial derivatives in some neighborhood of the point (y) — (6) — 
(0 (1) , • • • , $ (a) ), then the distribution of the random variable \/n{h(U') — h{6)\ 
tends to the normal distribution with mean zero and variance 



310 


WASSILY HOEFFDING 


Theorem 7.5 follows from Theorem 7.3 in exactly the same way as the theorem 
on the asymptotic distribution of a function of moments follows from the fact 
of their asymptotic normality; cf. Cram 6 r [ 2 , p. 36G]. We shall therefore omit 
the proof of Theorem 7.5. Since any moment whose variance exists has the 
form U' — 6(S) (cf. section 4 and Theorem 7.4), Theorem 7.5 is a generalization 
of the theorem on a function of moments. 

8. Limit theorems for U(X i, • ■ , X n ) when the X a ’s have different distri¬ 
butions. The limit theorems of the preceding section can be extended to the 
case when the I a ’s have different distributions. We shall only prove an exten¬ 
sion to this case of Theorem 7.1 (or 7.2), confining ourselves, for the sake of 
simplicity, to the distribution of a single ?7-statistie. 

The extension of Theorems 7.3 and 7.5 with <7 = 1 to this case is immediate. 
One has only to replace the reference to Theorem 7.1 by that to the following 
Theorem 8.1, and 0 and by E\ U\ and fi, n . 

Theorem 8.1. Let X \, • • • , X n be n independent random vectors of r com¬ 
ponents, X a having the d.f. F a (> r) = F a (x w , • • • , x (r) ). Let , • • • , x m ) be a 

function symmetric in its m vector arguments = (xj} l) , • • • , xp r) ) which does not 
involve n, and let 

(8.1) Vim(x) — (” !) S' (v = 1, • • • , n), 

where is defined by (5.15), and the summation is extended over all subscripts a 
such that 


1 < ai < Q!2 < • * * < 0Lm~ 1 < n, a % 7* V, (i - 1, • • • , m). 

Suppose that there is a number A such that jor every n — 1,2, • • • 

( 8 . 2 ) / ’ • • / •••.*■») <lF at (xi) • • • < . 1 , 


(1 < ai < a 2 < • • * < a m < n), 

that 

(8.3) £ | *?(,)(*,) | < oo, (* = 1,2, ... ,»), 

and 

n / [ n ) 3/2 

(8.4) lim E E |*!<„(*,) | / E £)*L>(.Y,)) \ = 0. 

n-»oo v-1 / V*-l J 

Then, as n-+ <x>, the d.f. of (U — E{U})/a(U) tends to the normal d.f. with mean 
0 and variance 1. 

The proof is similar to that of Theorem 7.1. 

Let 


W = -E*k„(Y,). 

n v-i 



It will be shown that 
(a) the d.f. of 


A CLASS OF STATISTICS 


311 


w - E{W) 

<W) 

tends to the normal d.f. with mean 0 and variance 1, and that 
(b) the d.f. of 

yt = 

* ✓ W T\ 


tends to the same limit as the d.f. of V. 

Part (a) follows immediately from (8.3) and (8.4) by LiapounofTs form of the 
Central Limit Theorem. 

According to Lemma 7.1, (b) will be proved when it is shown that 


lim E\V' - V } 2 = lim^2 - 2 


r(U, W) 


L <r(UMW) 


?.5) lim = l. 

n-oo cr(U)a-(W) 

Let c be an integer, 1 < c < m, and write 

x = (xi, ■ • ■ , x e ), y = (yi, ■ ■ ■ , y m -c), z = (zi, , z m - c ) 
F {a) (x) = F a ,(xi) ••• FcSxc), F(fitly) = F^lyi) ■■■ F? n _ c ( 2/m-*), 

F(y)(z) = F yi (zi) ■■■ Fy m _ c lz m - C ). 

Then, by Schwarz’s inequality, 


j ■ ■ ■ f *0c, y)$(x, z) dF ( „) (x) dFan (y) dF M (z) 


< {/ • • ■ / *\x, y) dF («)(x) dFmb) 

■J J tfix, z) dF (a ) (x) dF w (z)j , 

which, by (8.2), is < A for any set of subscripts. 

By the inequality for moments, 0 ai ,...,a m , as defined by (5.14), is also uni¬ 
formly bounded, and applying these inequalities to (5.16), it follows that there 
exists a number B such that 

(8*6) I 7l.---.7m-o I < (c = 1, * * * , Wl), 

for every set of subscripts satisfying the inequalities 

Otg 9* a h , Po ph , 7o 9* 7h if g 5 * h , Of,* 5 * Pi , Oti 7* y j , 

(i = 1, ••• ,c;j = 1, ••• ,m - c). 



312 


WASSILY HOEFFDING 


Now, we have 

E\W) * 0 
and 

(8.7) Aw) = “’E^mOT.)} 

n z 

or, inserting (8.1) and recalling (5.16), 

(8.8) AW) = £ (l ~ !)”*E E' E' - 

n 2 \m — 1/ ( ^ f) {f i ¥) 

the two sums S' being over «!<•••< a m -i, (a. ^ and ft < • * • < /3m—i, 
(ft ^ y), respectively. By (5.17), the sum of the terms whose subscripts 
Vy ai , • • • , a m - 1 , ft , • • • , /3 m -1 are all different is equal to 

n(n — 1) - * • (n — 2m + 2) _ /n — 1V n — m\ 

(m — 1) !(m — 1)! 1,n U \rrt — 1/\ra — 1/ 1,w 

The number of the remaining terms is of order n m ~ 2 . Since, by (8.6), they are 
uniformly bounded, we have 

(8.9) a\W) = — fi.„ + 0(rT 2 ). 

n 

Similarly, we have from (5.18) 

Au) = - f».» + 0(rT 2 ), 
n 

and hence 

(8.10) <r(U) = <r(fF) + 0(n _l ). 

The covariance of U and W is 

(8.11) a(U, W) = (”) * - E , • • • , X.J). 

\m/ n 

All terms except those in which one of the a ’s = y, vanish, and for the re¬ 
maining ones we have, for fixed ct\ , • • • , a m , 

^{ 1 ® r 1 ( |f )(Xj^ r m (a 1 ,...,« m )(Z ai , • * • , -XaJ} 

= (” I J)~‘ E'£(^(^, ...^- 1 W^Wr, -.7„- 1 (^)) 

fl(iO0iTi»‘"iT«-i 

where the summation sign refers to the fts, and 71 , • • • , 7 m -i are the ofs that 
are 5 ^ v. Inserting this in (8.11) and comparing the result with (8.8), we see that 

(8.12) . <r(U, W) = a 2 (W). 




A CLASS OF STATISTICS 


313 


From (8.12) and (8.10) we have 

*(£/, W) _ a(W) _ n*(W) 

<r(U)a(W) <r(U) na(W) + 0(1)* 

Comparing condition (8.4) with (8.7), we see that we must have ruj(W) —> oo 
as n —► oo. This shows the truth of (8.5). The proof of Theorem 8.1 is complete. 

For some purposes the following corollary of Theorem 8.1 will be useful, where 
the conditions (8.2), (8.3), and (8.4) are replaced by other conditions which are 
more restrictive, but easier to apply. 

Theorem 8.2. Theorem 8.1 holds if the conditions (8.2), (8.3), and (8.4) are 
replaced by the following: 

There exist two positive numbers C, D such that 

(8.13) / - / I 4> 3 (*i , • • • ,x m )\ dF ai (xi) • • • dF, m (x m ) < C 

for a t = 1, 2, • • • , (i = 1, • • • , m), and 

(8.14) !>l(p)cn,' -,a m _i; 0i,■ ■ ,0 m —i ^ T) 
for any subscripts satisfying 

1 < ai < <X 2 < • * * < a«-i, 1 < ft < ft < • • • < Pm-i , 1 < v 5* a,, ft| 

We have to show that (8.2), (8.3), and (8.4) follow from (8.13) and (8.14). 
(8.13) implies (8.2) by the inequality for moments. By a reasoning analogous 
to that used in the previous proof, applying Holder’s inequality instead of 
Schwarz’s inequality, it follows from (8.13) that 

(8.15) £|$? ( „(X,)| < C'. 

On the other hand, by (8.7), (8.8), and (8.14), 

(8.16) > nD. 

r-1 

(8.15) and (8.16) are sufficient for the fulfillment of (8.4). 

9. Applications to particular statistics. 

(a) Moments and functions of moments. It has been seen in section 4 that the 
^-statistics and the unbiased estimates of moments are ^-statistics, while the 
sample moments are regular functionals of the sample d.f. By Theorems 7.1, 
8.1, and 7.4 these statistics are asymptotically normally distributed, and by 
Theorem 7.5 the same is true for a function of moments, if the respective condi¬ 
tions are satisfied. These results are not new (cf., for example, Cramer [2]). 

(b) Mean difference and coefficient of concentration. If Y\ , • • • , Y n are n in¬ 
dependent real-valued random variables, Gini’s mean difference (without repeti¬ 
tion) is defined by 


1 



314 


WASSILY HOEFFDING 


If the Fa’s have the same distribution F, the mean of d is 
- s = Jf 12/1 - 2 / 2 1 dF{yi) dF(y t ), 
and the variance, by (5.13) is 

Ad) = w( ^ 2 _ i} - {2f!(5)(n - 2) + f 2 (i)|, 

where 

(9.1) fi («) = /{/ 12/i “ 2/21 dF(y 2 )j dF ( Vl ) - s\ 

(9.2) f 2 («) = // (2/i ~ yAdF{ yi ) dF(y .,) - 5 2 = 2<r 2 (F) - 5 2 . 

The notation fi(5), f?(6) serves to indicate the relation of these functionals of 
F to the functional 8 (F); d is here merely the symbol of the functional, not a par¬ 
ticular value of it. In a similar way we shall write $(yi , 2 / 2 1 8 ) = | y\ — y% | , 
etc. When there is danger of confusing ft(5) with fi (F), we may write fi (F | 5). 

U. S. Nair [19] has evaluated a(d) for several particular distributions. 

» By Theorem 7.1, ->/n(d — 8 ) is asymptotically normal if £ 2 ( 8 ) exists. 

If Fi, • • • , F„ do not assume negative values, the coefficient of concentration 
(cf. Gini [8]) is defined by 

' 2 Y } 

where Y — 2 Y a /n. G is a function of two £/-statistics. If the F a ’s are identi¬ 
cally distributed, if E{ Y 2 1 exists, and if /x = E{ Y) > 0, then, by Theorem 7.5, 
\/n(G — 8 / 2m) tends to be normally distributed with mean 0 and variance 

4 /x 4 ~~ ^3 ^ + ^2 f 1 ®’ 

where 

Mm) = / 2/ S dF{y) - M 2 = Ay), 


Mm, i) = // 2/1 1 2 /i - 2 / 2 1 <iF( 2 /i) dF( 2 / 2 ) - m«, 
and fi(fi) is given by (9.1). 

(c) Functions of ranks and of the signs of variate differences . Let s(w) be the 
signum function, 

— 1 if u <0; 
s(u) = 0 if u = 0; 


1 if u > 0, 


(9.3) 



A CLASS OF STATISTICS 


315 


and let 


0 if u < 0; 


(9.4) 

If 


c(u ) = £[1 + s(w)} = ^ if u = 0; 

1 if u > 0. 

Xa = (s«\ • • • , £«’), (a = 1, • • • , n) 


is a sample of n vectors of r components, we may define the rank Ra ] of xl” by 


1C = i + E c(arL l) - *J°) 

(9.5) ^ „ 

= + l £ std 0 - »A « = 1, • • • , r). 

If the numbers xl l) , .r 2 0 , • • • , arc all different, the smallest of them 

has rank 1, the next smallest rank 2, etc. If some of them are equal, the rank 
as defined by (9.5) is known as the mid-rank. 

Any function of the ranks is a function of expressions c(x„ x) — xp l) ) or 

*(*« ' ) - *0°). 

Conversely, since 

S(xi l) - x< t ") = s(Rl x) - Rf), 

any function of expressions s(x„ 5 — .)'h") or r(x« > — 4°) is a function of the 
ranks. 

Consider a regular functional 0(F) whose kernel $(#1 , • • • , x m ) depends only 
on the signs of the variate differences, 

(9.6) s(x i° - 4°), (a, 0 = 1, • • • , m; i = 1, • • • , r). 


The corresponding 17-statistic is a function of the ranks of the sample variates. 
The function <I> can take only a finite number of values, Ci, • • • , r v , say. If 
7 r» = P{4> = c*}, (t = 1, • • • , N), we have 

N 

0 = Cl 7T1 + ‘ • * + Cn 7TAT , *■* = 1* 

t=l 

7 T i is a regular functional whose kernel 4> t (;ri, • • • , x m ) is equal to 1 or 0 accord¬ 
ing to whether 4> = c» or 5 ^ c». We have 

4> = Ci4>i + • • • + Cn$n . 


In order that 0(F) exist, the c t - must be finite, and hence $ is bounded. There¬ 
fore, E{$ 2 } exists, and if r Xi, X 2} • • • are identically distributed, the d.f. of 
y/~n(JJ — 0) tends, by Theorem 7.1, to a normal d.f. which is non-singular if 
fi > °. 

In the following we shall consider several examples of such functionals. 



316 


WASSILY HOEFFDING 


(d) Difference sign correlation. Consider the bivariate sample 

( 9 . 7 ) 

To each two members of this sample corresponds a pair of signs of the differ¬ 
ences of the respective variables, 

( 9 . 8 ) *(*<*> - 4°). s(x? - 4 J> ). (a * fi; a, p = l, ,n). 

(9.8) is a population of n(n — 1) pairs of difference signs. Since 

23 six? - 4 fl ) = o, (* = i, 2 ), 

the covariance t of the difference signs (9.8) is 

(9.9) t = 1 Z s(x? - x?Hx? - 4»). 

n(n — 1) 

/ will be briefly referred to as the difference sign covariance of the sample (9.7). 
If all x (1)f s and all z (2), s are different, we have 

53 s*ix? - x?) = n(n - 1), (i = 1, 2), 

art 

and then t is the product moment correlation of the difference signs. 

It is easily seen that t is a linear function of the number of inversions in the 
permutation of the ranks of x {l) and z (2> . 

The statistic t has been considered by Esscher [6], Lindeberg [15], [16], Kendall 
[12], and others. 

t is a [/-statistic. As a function of a random sample from a bivariate popula¬ 
tion, t is an unbiased estimate of the regular functional of degree 2, 

(9.10) r = //// *G c i 1> “ z* 1> )«(si 2) "" ^ 2 2) ) dF(x i) dF(x 2 ). 

r is the covariance of the signs of differences of the corresponding components 
of Xi = (Zi (1) , A{ 2) ) and X 2 = (X 2 l \ X 2 2) ) in the population of pairs of inde¬ 
pendent vectors X \, X 2 with identical d.f. F(x) = F(x {l) , x {2) ). If F(x {l \ x {2) ) 
is continuous, r is the product moment correlation of the difference signs. 

Two points (or vectors), ( xi\ Zi 2) ) and (z^, # 2 2) ) are called concordant or 
discordant according to whether 

<tf ) - - *®) 

is positive or negative. If 7r (c) and 7r (d) are the probabilities that a pair of vectors 
drawn at random from the population is concordant or discordant, respectively, 
we have from (9.10) 

T - 

T — 7T — 7T 

If F(x w , x m ) is continuous, we have ir (c> + ir ld> = 1, and hence 

(9.11) r = 2ir <e> -1 = 1- 2t (J> . 



A CLASS OF STATISTICS 


317 


If we put 

F(x w , x m ) = i{F(x (1) - 0, x m - 0) + F(x m - 0, ** + 0) 

/Q JO) 

+ F(x m + 0, x <2) - 0) + F(x m + 0, x m + 0)}, 

we have 

(9.13) 4>,(x | r) = 1 - 2 F(x w , «o) - 2F(oo, x m ) + 4 P(x m , x m ), 

and we may write 


(9.14) 

r = B{^Xi\t)}. 

The variance 

of t is, by (5.13), 

(9.15) 

a (0 n ^ n __ ^ {2fi(r)(/i — 2) + f 2 (r)}, 

where 


(9.16) 

fl (t) = #[<!>? (Xi I t)} — T , 

(9.17) 

£ 

II 

1 

ti 

rc 

1 

k 

to 

1 


If F(x (1 \ x (2) ) is continuous, we have f 2 (r) = 1 — r 2 , and P(x (1) , x (2) ) in (9.13) 
may be replaced by F(x {l \ x {2) ). 

The variance of a linear function of t has been given for the continuous case by 
Lindeberg [15], [16]. 

If X (1) and X i2) are independent and have a continuous d.f., we find fi(r) = £, 
f 2 (r) = 1, and hence 


(9.18) 


At) 


2(2 n + 5) 
\)n(n — 1 )’ 


In this case the distribution of t is independent of the univariate distributions 
of X (1) and X (2) . This is, however, no longer true if the independent variables 
are discontinuous. Then it appears that <r 2 (t) depends on P\X\ t} = X 2 x) ( 
and P{X[ l) = X? = X 3 (,) ), (i = 1, 2). 

By Theorem 7.1, the d.f. of \/n(t — r) tends to the normal form. This result 
has first been obtained for the particular case that all permutations of the ranks 
of X (1) and X {2) are equally probable, which corresponds to the independence 
of the continuous random variables X (1) , X {2) (Kendall [12]). In this case t can 
be represented as a sum of independent random variables (cf. Dantzig [5] and 
Feller [7]). In the general case the asymptotic normality of t has been shown 
by Daniels and Kendall [4] and the author [10]. 

The functional t(F) is stationary (and hence the normal limiting distribution 
of Vn(< — r) singular) if fi = 0, which, in the case of a continuous F , means that 
the equation $i(X | r) = r or 

(9.19) 4F(JX' <1> , X m ) = 2F(X W , •) + 2F(«, X m ) - 1 + r 



318 


WASSILY HOEFFDING 


is satisfied with probability 1. This is the case if X (2) is an increasing function 
of A" (1) . Then t = r = 1 with probability 1, and a(t) =0. A case where (9.19) 
is fulfilled and <r 2 (t) > 0 is the following: X il) is uniformly distributed in the 
interval ( 0 , l), and 

(9.20) X (2) = X {1) + if 0 < X (1) < i, X (2) = X a) - l if < X {1) < 1 . 

In this case r = 0, f 2 = 1, a 2 (t) = 2/n(n — 1). 

(e) Rank correlation and grade correlation. If in the sample l(xa\ x ( a 2) )}, 
(a = 1 , • • • , n), all x* ), s and all ), s are different, the rank correlation co¬ 
efficient, which we denote by k', is given by 


k' = 



-i j X* 


( 2 ) 

a 




Inserting (9.5) we have 
3 


V = ~ - EEI - 4 2> ) 

a=l 1 7-al 


or 

(9. 21) 


k , = («_z m + 31 
n + 1 


where t is the difference sign covariance (9.9), and 


the summation being over all different subscripts a, 0 , 7 . 

k is a tZ-statistic, and as a function of a random sample from a population with 
d.f. F, k is an unbiased estimate of the regular functional of degree 3, 


k = 3 J • • • J s(x I 15 — X 2 l) )s(x[ 2) — xi 2) ) dF(xi) dF(x 2 ) dF(x.i) 

(9.22) 

= 3 j j (2 F^'V 1 ’) - lj |2F ( 2 ) (a; <2> ) - 1) dF(x), 

where F m (t w ) - % (I) , oo), P»(*») = P(oo,^ 2) ). 

If F is continuous, we have 

j F M (y ) dF M (y) = j ( udu = 


j {F"(y) - il 2 dP W («) = jf (a - *) 2 d« = A, (t = 1, 2), 


and in this case k is the coefficient of correlation between the random variables 
U m = F (l \X m ), U m = F m (X m ). 



A CLASS OF STATISTICS 


319 


U M has been termed the grade of the continuous variable X (l) , and in the general 
case J p (t) (x (0 ) may be called the grade of X {%) (cf., for instance, G. U. Yule and 
M. G. Kendall [22, p. 150]). In general, k is 12 times the covariance of the 
grades. 

From (9.21) we have for the expected value of k ', 

E[k'} = 

n 1 

In the continuous case the rank correlation coefficient k ' is an estimate of the 
grade correlation *, which is biased for finite n but unbiased in the limit. 

The kernel 3s(x i 1} — x 2 l) ) 8 (x[ 2) — xi 2) ) of k is not symmetric. Denoting by 
<f>(zi, X 9 , xz | k) the symmetric kernel of k, we have 

(9.23) $>(xi , x 2 , x z | k) = i 2 s(z« } ~ x^)s(x (2) - x (2) ) 

* y 

a*y 

For computing k and the constants f t an alternative expression for k and 4> is 
sometimes more convenient. From three two-dimensional vectors x \, x 2 ? 
we can form three pairs (xi , x 2 ), (a , o* 3 ), and fe, £ 3 ). The number of con¬ 
cordant pairs among them can be 3, 2 , 1 , or 0 . If 7 is the probability that among 
the three pairs formed from three random elements of the population at least 2 
are concordant, we have, if the d.f. F is continuous, 

(9.24) * = 2t - 1. 

This is analogous to the expression (9.11) for r. 

The truth of (9.24) can be seen as follows: From the definition of 7 we have 

7 = E{^{xi , Xi , x 3 1 7)1, 

where 4>(xi, x 2 , x 3 17 ) is = 1 if at least two of the three expressions 

(9.25) (xi v - x^XxL 2 ' ~ 4 2> ), (a < 0; a, 0 = 1, 2, 3) 

are positive, and equal to zero, if no more than one of them is positive. Since, 
by the continuity of F, we may neglect the case of (9.25) being zero, we may 
write 

$(#1 , X 2 , Xz | 7) = C12,12^23,23^31,31 + Ctf, 12^23,23031,13 + C12,12^23,32^31,31 + Ci2,2^23,23^31,31 , 

where 

Ca.e.y.t = c[(xi u - x^Xx? - 4”)] 
and c(u) is defined by (9.4). 

4>(:ri, x 2 , xz | 7 ) is symmetric in x\ , x 2 , x 3 . 

The identity 

(9.26) $(£ 1 , x 2 , x 3 1 k) = 24>0ri f x% 9 xi\y) — 1 



320 


WASSILY HOEFFDING 


can be shown to hold either by algebraical calculation using (9.4) or by direct 
computation of each side for the different positions of the three points x \, x %, x $. 

From (9.26) it appears that in the continuous case the symmetric kernel 
$(#i, xt , x z | k) can assume only two values, — 1 and +1. 

The variance of k is, according to (5.13), 

'W - .(. - 1)(„ -2) {' 3 (” 2 0 f*W + 3 <” " 3 >f'« + «•> } ■ 

where 

f.W = I «)} - « 2 , 

U«) - mi(Xi, Xt I *)} - k\ 

hw « # f<j> 2 (x,, x 2 , x, i *)) - « 2 , 

*i(x,! k) = £{*(x, , X 2 , X, | «)}, 

$j(xi, x 2 I k) = #{$(xi , x 2 , X, I *)). 

We find for the continuous case 

f jM = i - 

(9.27) ^(x, |«) - [1 - 2F(x{ I> , «)][1 - 2F(«>, x{ 2) )] - 2F(x{ 1 >, «o) 

- 2 F( co, x{ 2) ) + 4 f F(x^, y m )dF( oo, ,») 


+ 4 fF(y w ,xr)dF(y w , »), 

*»(ft , ft | «) - 1 + 2F(xP ) , xf) + 2F(xi\ x{ 2) )- 2c(xf - x[ 2) )F(x[ u , «) 
-2c(xf’ - x^)F(xi u , ®) - 2c(x 2 1) - xj i y(»,xi <2> ) 

-2c(x} 1) -* 

ifx (, \x (2> are continuous and independent, we obtain k = 0, fi = £, ft = rg-, 
ft = 1, and hence 

(9,28) <r ’ (fc) = n(n - 1 )(» - 2 ) ' 


In the discontinuous case of independence the distribution of k, as that of t, 
depends on the distributions of X W) and X (2> , and <r 2 (&) can again be expressed 
in terms of P\X[" = X|°} and P{X{” = X^ = X 3 w j, (* = 1, 2). 

The variance of the rank correlation coefficient k' is, by (9.21), 


Ak') 


(n — 2) 2 a*(k) + 6(n — 2 )<r{t, k) + 9<r 2 (l) 


(9.29) 


(» + l ) 2 



A CLASS OF STATISTICS 


321 


For <r(t } k) we have, according to (6.5), 

0 

fc) = n ( n Z”l) K n — 3)fi(r, k ) + fi(r, <c)}, 

where 

fi(r, *) = I r)'f> I (X 1 1(c)) - tk, 

Ur, k) = £{$(*!, X t | t)UXi ,X t \ k)} - TK. 

In the case of independence we see from (9.13) and (9.27) that 

$>,(* | r) = $i(x | <t) = [1 - 2 F(x w , oo)][l - 2F(oo, x m )], 
and we obtain 

(9.30) fi(r, k) = fi(*) = fi(r) = *, 

fo( r f *) = V, 

<* 3 ‘> ■ 

On inserting (9.28), (9.31) and (9.18) in (9.29), we find 



in accordance with the result obtained for this case by Student and published 
by K. Pearson [20]. 

According to Theorem 7.1, \/n(k — k) tends to be normally distributed with 
mean 0 and variance 9fi(*). The same is true for the distribution of the rank 
correlation coefficient, k\ as follows from Theorem 7.3 in conjunction with 
(9.21). For the special case of independence the asymptotic normality of k f 
has been proved by Hotelling and Pabst [11]. 

From Theorem 7.3 it also follows that the joint distribution of y/n(t — r) 
and \/n(k — k) (or \/n(k' — k)) tends to the normal form with the variances 
4fi(r) and 9fi(*) and the covariance 6fi(*> r). In the case of independence we 
see from (9.30) that the correlation p(t , k) between t and k tends to 1, and we have 
the asymptotic functional relation 3£ = 2k. This result has been conjectured by 
Kendall and others [14], and proved by Daniels [3]. In general, however, p(t, k) 
does not approach unity. Thus, if X w is uniformly distributed in (0,1), and 

X {2) = \ - X (1) if 0 < X (1) < i, 

X (2) = i + X (1) if i < X (1) < }, 

(9.32) X (2) = X (1) - h if i < X (1) < }, 

X {2) - I - X (1) if i < X (1) < 1, 

we have r = k = 0, fi(r) = 0, f 2 (r) = 1, fi(#c) = Tt, fi( k, r) = 0, and hence 
p(t, k) —> 0. 



322 


WASSILY HOEFFDING 


(f) Non-parametric tests of independence. Suppose that the random variables 
X {1> , X (2) have a continuous joint d.f. F(x il \ x (2) ), and we want to test the 
hypothesis Ho that X w and X i2) are independent, that is, that 

F(x (l \ x {2) ) = F(x w , oo) F(oo, x {2) ). 

The distribution of any statistic involving only the ranks of the variables 
does not depend on the d.f. of the population when Ho is true. For this reason 
several rank order statistics, among them the difference sign correlation t and 
the rank correlation k', have been suggested for testing independence. 

From the preceding results we can obtain the asymptotic power functions of 
the tests of independence based on t and k\ If Ho is true, we have E[t j = r = 0, 
and the critical region of size e of the t- test may be defined by | t | > c n , where 
Cn is the smallest number satisfying the inequality 

(9.33) P{\t\> C n \Ho] < €. 

By Theorem 7.2 and (9.18) we may write c n = 2\„/3 Vn, where X n tends to a 
positive constant X depending on e. 

Since <r 2 (t) = Ofa"" 1 ), the power function 

Pn(H) = P{ \t | > 2X n /3Vn | H\ 

tends to one as n —► <*> for any alternative hypothesis H with r{F) j* 0. If, 
however, r = 0, we have lim P n (H) <1. If r = 0 and ft(r) < £, we have even 
lim P n (H ) < 6, and with respect to these alternatives the test is biased in the 
limit. Thus, in the case of the distribution (9.20) we have even P n (H) 0. 
In this case there is a functional relationship between the variables, and the 
distribution must be considered as considerably different from the case of in¬ 
dependence. 

For the rank correlation test we have a similar result. If c n is the_smallest 
number satisfying P{ | k! | > c n | H 0 } < €, we have c n = Xn/Vn, where 
lim Xn = X, and the test is biased in the limit if k = 0 and fi(*c) < £. This is ful¬ 
filled in the case of the distribution (9.32), where fi( k) = 

The question arises whether there exist non-parametric tests of independence 
which are unbiased or unbiased in the limit. This point will be discussed in a 
separate paper on tests of independence. 

(g) Mann's test against trend. Let Yi , • • • , Y n be n independent real-valued 
random variables, Y a having the continuous d.f. F a (y), (a = 1 , • • • , n). 
The hypothesis of randomness, 

Hl : Fi(y) = • • • = F n (y) 

is to be tested against the alternative hypothesis of a “downward trend/ , 

H 2 : Fi(y) < F 2 (y) < < F n {y). 


H. B. Mann [17] has suggested a test of Hi against H 2 based on the number T 
of inequalities F« < Y where a < p. We may write 


n(n — 1) V' tr \ ^ ,/tr 



A CLASS OP STATISTICS 


323 


The £/-statistic 


t = {4T/n(n - 1)} - 1 

is the same as (9.9) for the special case when one component is not a random 
variable. 

Let 


Tan = s(a - 0) J J s(yi - y 2 ) dF„{y^ dF fi (yi) 
= s(a - 0) | 2 j F s (y) dF a (y) - 1 J . 

We have T a p = 0 if H i is true and r a|3 < 0 if H 2 is true. 

Since 


E\t) = r n = 2 ,, Z 

it follows that 2?{2) =0 under Hi and E{t) <0 under H ?. 

Mann’s test against trend has the power function I J n (H) = P{t < a n | H\, 
where a n is the largest number satisfying P{t < a n \H\\ < e. 

Since a n —> 0 and, by (5.18), a 2 it) = 0(n~ l ), it follows from Tchebycheff’s 
inequality that the test is consistent (that is, P n (Hi) —> 1) and hence unbiased 
in the limit. This has been shown by Mann who also gave sufficient conditions 
under which the test is unbiased for finite n. 

By Theorems 8.1 and 8.2 the distribution of (t — r n )/o-(0 is asymptotically 
normal if certain conditions are satisfied. Since (8.2), (8.3) and (8.13) are ful¬ 
filled, either of the conditions (8.4) and (8.14) is sufficient. 

(h) The coefficient of partial difference sign correlation. Consider a three- 
variate sample X\ , • • • , x n ; .r a = ix? f x ? , x ? ) ), (a = 1 , • • • , n). In a sim¬ 
ilar way as in section 9d we may form the set of the n(n — 1) triplets of differ¬ 
ence signs, 

( 9 . 34 ) .(*< i> - xn s(x? - x n s(x? - xn 

(of 7 * 0; Of, P = 1 , • • • , n). 

We shall assume that all x (1) ’s, rr (2) ’s, and x (3) ’s are different. Then the triplets 
(9.34) contain only two different numbers, +1 and — 1. Hence the regression 
functions of the three-variate population (9.34) are linear. 

If ti 2 , *i3, and t 2 3 are the difference sign correlations of {six? — xjj l) ), 
six? - x ( ?)}, {s(x« } - xn s ( x « 8) ~ x / 3 3) )} and {s(a;l 2) ~ xp 2) ),six? - a^ 3) )} 
respectively, we have for the coefficient tu 3 of partial correlation between 
six? — xi?) and six? — x with respect to six? — x?), 

hi — <18 <28 

V(1 - <?,)(! - &) ‘ 


(9.35) 


< 12.3 



324 


WASSILY HOEFFDING 


This measure of partial correlation has been suggested by Kendall [13] who 
gave an alternative definition of £ 12 . 3 . 

If we have two independent three-dimensional random vectors 
Xi f = ( Xx \ Xi \ Xi 8) ) and X 2 = (Xi\ X 2 \ X 2 2) ) with the same continuous 
d.f. F(x a \ x {2) , x iZ) ), the distribution of the difference signs — X 2 l) ) f 

(i = 1, 2, 3), has again linear regression functions, and we may define the 
partial difference sign correlation 


__ T12 —■ Tis T 2 3 

T,! ' 3 “ V(T - rf,)(l - rjj) ’ 

where x»> is the difference sign correlation of X (t \ X U) . 

If < 12.3 is a function of a random sample, and if t 2 u ^ 1, r|* 3 * 1, the d.f. of 
Vn(tn.z — 7 * 12 . 3 ) tends, by Theorem 7.5, so the normal d.f. with mean zero and 
variance 


2 

CT12.3 = 


+ 


(1 ri3) (1 T 23 ) 

(Tl3 — Ti 2 T 2 3 ) 2 

(1 - 




7*23 — 7*12 Ti 3 / x ^ T 13 ~ 7*i 2 T 2 3 . , N 

* 1 2 f 1V12 , 7*13) — J - 2 S 1^12 , r 2 3 / 


7*13 " ' 1 — 7- 23 

(t*23 ~ T 12 T13) (Tis ““ 7-l 2 T23) 


+ 2 


(1 — 7 * 13 ) (1 — T 23 ) 


f 1 ( 7-13 


, rn) |, 


where 


f(r,y) =E {f>?(X | t„)\ - 7% , 

fi(ry , Tgh) = E{$i(X | t.,) 4 >](X | t„*)) — , 


and, for instance (cf. (9.13)), 

$i(x | m) — 1 — 2F(x <1) , oo, co) - 2F(»,x ,2) , oo) + 4F(x w , x®, «). 


If Xu = r 23 = 0, we have 

0*12.3 — 4fi(xi 2 ), 

and Vn(ti 2 .z — ri 2 . 3 ) has the same limiting distribution as \Jn(i\ 2 — ri 2 ). This is 
in particular the case when X (1) , X (2) , X (3) are independent. 


REFERENCES 

[1] H. Cramer, Random Variables and Probability Distributions, Cambridge Tracts in 

Math., Cambridge, 1937. 

[2] H. CramAr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[3] H. E. Daniels, “The relation between measures of correlation in the universe of sam¬ 

ple permutations,” Biometrika , Vol. 33 (1944), pp. 129-135. 

[4] H. E. Daniels and M. G. Kendall, “The significance of rank correlations where 

parental correlation exists,” Biometrika, Vol. 34 (1947), pp. 197-208. 

[5] G. B. Dantzig, “On a class of distributions that approach the normal distribution 

function,” Annals of Math. Stat., Vol. 10 (1939) pp. 247-253. 



A CLASS OF STATISTICS 


325 


[6] F. Esscher,“Oii a method of determining correlation from the ranks of the variates,” 

Skandinavisk Aktuar. lids., Vol. 7 (1924), pp. 201-219. 

[7] W. Feller, “The fundamental limit theorems in probability,” Am. Math. Soc. Bull., 

Vol. 51 (1945), pp. 800-832. 

[8] C. Gini, “Sulla misura della concentrazionc e della variability dei caratteri,” Atti del 

R. Istituto Veneto di S.L.A., Vol. 73 (1913-14), Part 2. 

[9] P. It. Halmos, “The theory of unbiased estimation,” Annals of Math. Stat., Vol. 17 

(1946), pp. 34-43. 

[10] W. HGffding, “On the distribution of the rank correlation coefficient t, when the 

variates are not independent,” Biometrika , Vol. 34 (1947), pp. 183-196. 

[11] H. Hotelling and M. It. Pabst, “Rank correlation and tests of significance involving 

no assumptions of normality,” Annals of Math. Stat., Vol. 7 (1936), pp. 29-43. 

[12] M. G. Kendall, “A new measure of rank correlation,” Biometrika , Vol. 30 (1938), 

pp. 81-93. 

[13] M. G. Kendall, “Partial rank correlation,” Biometrika , Vol. 32 (1942), pp. 277-283. 

[14] M. G. Kendall, S. F. H. Kendall, and 13. Babington Smith, “The distribution of 

Spearman’s coefficient of rank correlation in a universe in which all rankings 
occur an equal number of times,” Biometrika , Vol. 30 (1939), pp. 251-273. 

[15] J. W. Lindebero, “Uber die Korrelation,” VI Skand. Matematikerkongres i K$ben- 

havn, 1925, pp. 437-446. 

[16] J. W. Lindeberg, “Some remarks on the mean error of the percentage of correlation,” 

Nordic Statistical Journal, Vol. 1 (1929), pp. 137-141. 

[17] H. B. Mann, “Nonparametric tests against trend,” Eeonometrica , Vol. 13 (1945), 

pp. 245-259. 

[18] R. v. Mises, “On the asymptotic distribution of differentiable statistical functions,” 

Annals of Math. Stat., Vol. 18 (1947), pp. 309-348. 

[19] U. S. Nair, “The standard error of Gini’s mean difference,” Biometrika , Vol. 28 (1936), 

428-436. 

[20] K. Pearson, “On further methods of determining correlation,” Drapers ’ Company 

Research Memoirs, Biometric Series, IV, London, 1907. 

[21] V. Volterra, Theory of Functionals, Blackie, (authorized translation by Miss M. 

Long), London and Glasgow, 1931. 

[22] G. U. Yule and M. G. Kendall, An Introduction to the Theory of Statistics, Griffin, 

11th Edition, London, 1937. 



OPTIMUM CHARACTER OF THE SEQUENTIAL PROBABILITY RATIO 

TEST 

A. Wald and J. Wolfowitz 
Columbia University 

1 . Summary. Let So be any sequential probability ratio test for deciding 
between two simple alternatives Ho and Hi , and Si another test for the same 
purpose. We define (i, j = 0, 1): 

= probability, under Sj , of rejecting H l when it is true; 

E\ (n ) = expected number of observations to reach a decision under test Sj 
when the hypothesis Hi is true. (It is assumed that E\ ( n) exists.) 

In this paper it is proved that, if 

cu(Si) < ai(So) (i = 0,1), 

it follows that 

E] (n) < E\ (n) (i = 0, 1). 

This means that of all tests with the same power the sequential probability ratio 
test requires on the average fewest observations. This result had been con¬ 
jectured earlier ([1], [2]). 

2. Introduction. Let Pi(x), i = 0,1, denote two different probability density 
functions or (discrete) probability functions. (Throughout this paper the index 
i will always take the values 0, 1). Let X be a chance variable whose distribu¬ 
tion can only be either p 0 (x) or pi(x) } but is otherwise unknown. It is required 
to decide between the hypotheses H 0 , Hi , where Hi states that p%(x) is the dis¬ 
tribution of X, on the basis of n independent observations Xi , • • • , x n on X, 
where n is a chance variable defined (finite) on almost every infinite sequence 

W = Xi , Xi , • • • 

i.e., n is finite with probability one according to both p 0 (x) and pi(x). The 
definition of n(«) together with the rule for deciding on H 0 or Hi constitute a 
sequential test. 

A sequential probability ratio test is defined with the aid of two positive 
numbers, A* > 1, B* < 1, as follows: Write for brevity 

Va = n PiM- 

fc -1 

Then n = j if 

^ >A* or < B* 

Pot 


326 



SEQUENTIAL PROBABILITY RATIO TEST 


327 


and 


If 


if 


B* 



< A*, 


k < j. 


y n > A*, the hypothesis Hi is accepted, 

POn 


F n < B* the hypothesis H 0 is accepted. 

POn 

In this paper we limit consideration to sequential tests for which Ei(n) exists, 
where E t (n) is the expected value of n when H t is true (i.e., when p t (x) is the dis¬ 
tribution of X). It has been proved in [3] that all sequential probability ratio 
tests belong to this class. The purpose of the paper is to prove the result stated 
in the first section. Throughout the proof we shall find it convenient to 
assume that there is an a priori probability g t that H t is true (go + gi = 1; we 
shall write g = (< 7 o , < 71 )). We are aware of the fact that many statisticians 
believe that in most problems of practical importance either no a priori pro¬ 
bability distribution exists, or that even where it exists the statistical decision 
must be made in ignorance of it; in fact we share this view. Our introduction 
of the a priori probability distribution is a purely technical device for achieving 
the proof which has no bearing on statistical methodology, and the reader will 
verify that this is so. We shall always assume below that g Q 9 ^ 0 , 1 . 

Let Wo , Wi , c be given positive numbers. We define 

R = go(Woao + cEo(ri)) + gi(Wiai + cEi(ri)), 

and call R the average risk associated with a test S and a given g (obviously R 
is a function of both). We shall say that Hi is accepted when the decision is 
made that p x (x) is the distribution of X. We shall say that Ho is rejected when 
Hi is accepted, and vice versa. The reader may find it helpful to regard Wi 
as a weight which measures the loss caused by rejecting Hi when it is true, c as 
the cost of a single observation, and R as the average loss associated with a given 
g and a test S. For mathematical purposes these are simply quantities which 
we manipulate in the course of the proof. 


3. Role of the probability ratio. Let g, W = (W 0 , Wi), and c be fixed. Let 
S be a given sequential test, with R(S ) the associated risk and n(co, S) the as¬ 
sociated “sample size” function. Let \p(xi , • • • , x n ) be the “decision” function; 
this is a function which takes only the values 0 and 1 , and such that, 
when X \, • • • , x n is the sample point, the hypothesis with index \l/(x 1 , • • • , x n ) is 
rejected. Define the following decision function <p(x 1 , • • • , x n ): <p = 0 when 

X = P ln 
WogoPon 



328 


A. WALD AND J. WOLFOWITZ 


is greater than 1, and <p = 1 when X < 1. When X == 1, <p may be 0 
or 1 at pleasure. 

It must be remembered that an actual decision function is a single-valued func¬ 
tion of (xi , • • • , x n ). We note, however, that 

a) the relevant properties of a test are not affected by changing the test on a 
set T of points o> whose probability is zero according to both H Q and Hi , i.e., 
changing the definition on T of n and/or of the decision function, leaves ao, 

, E 0 (n) ajnd Ei(n) unaltered. In particular, the average risk R remains un¬ 
changed. 

b) the set of points for which p Qn = pin = 0 and X is indeterminate, has prob¬ 
ability zero according to both H Q and Hi . 

In view of the above we decide arbitrarily, in all sequential tests which we 
shall henceforth consider, to define n = j, and ~ 0, whenever p 0j = p i3 = 0 , 
and n iA 1, • • • , (j — 1). By this arbitrary action R(S) will not be changed. 

Let now 


We have 


L 


in 


WiffiPin 
QoPOn + giPln * 


L n = cn + min (L 0n , Li„). 


EL^n — ^QiWx&i 

where the operator E denotes the expected value with respect to the joint dis¬ 
tribution of Hi and (xi , • • • , x n ), i.e., E is the operator g 0 E Q + g x Ei . If now 
the event j* <p and X ^ 1} has positive probability according to either 

H 0 or Hi , we would have, for n = n(w, S), 

EL^n < EL+ n . 

Hence, if the decision function \f/ connected with the test S were replaced by the 
decision function <p y R would be decreased. Since our object throughout this 
proof will be to make R as small as possible, we shall confine ourselves henceforth, 
except when the contrary is explicitly stated, to tests for which <p is the decision 
function. This will be assumed even if not explicitly stated. 

The function <p has not yet been uniquely defined when X = 1 . A definition 
convenient for later purposes will be given in the next section. R is the same 
for all definitions. 

We thus have that <p is a function only of X, or, what comes to the same thing 
when W is fixed, of r» = — . Define 

POn 




SEQUENTIAL PROBABILITY RATIO TEST 


329 


We shall now prove 

Lemma 1 . Let g y W, and c be fixed. There exists a sequential test S* for which 
the average risk is a minimum. Its sample size function n (w, S*) can be defined 
by means of a properly chosen subset K of the non-negative half-line as follows: 
For any co consider the associated sequence 

n , r 2 , • • • 

and let j be the smallest integer for which r } t K. Then n = j. The function n 
may be undefined on a set of points to whose probability according to Ho and Hi is zero. 
Let a — («i , • • • , Od) be any point in some finite d-dimensional Euclidean 

space, provided only that podia) and puia) are not both zero. Let b = and 

Podia) 

let 1(a) = cd + min (L 0d , L xd ). Let D be any sequential test whatever for 
which n(u, D) > d for any to whose first d coordinates are the same as those of 
a, and for which E(n \ a, D) < <*, w’here E(n | a, D) is the conditional expected 
value of n according to the test D under the condition that the first d coordinates 
of <o are the same as those of a. For brevity let G represent the set of points w 
w T hich fulfill this last condition, i.e., that the first d coordinates of w are the same 
as those of a. Finally, let E(L n | a, D) be the conditional expected value of 
L„ according to D under the condition that u is in the set G. We know that 
min(L 0 d , Lid) depends only on r d (a) = b. 

Write 

via) = sup [lia) - EiL n | a, D)]. 

i) 

Let ao = (aoi, • • • , ao*) be any point such that 

Puia) _ Pikioo) 

Podia) pokia 0 ) ‘ 

Let D 0 be any sequential test w hatever for w hich r(w, D 0 ) > k for any « whose 
first k coordinates are the same as those of a 0 , and for which Ein | Oo , D 0 ) < <» 
Let 

via 0 ) = sup [/(ao) — EiL n | ao , Z> 0 )]. 

Do 

We shall prove that via) — via 0 ). Thus w r e shall be justified in writing 

7 ib) = via) = viao). 

Suppose, therefore that via) > viao). Let D x be a test of the type D such 
that 

l(a)-E(L n \a,D 1 )> v ^±^ ) . 

We now’ partially define another sequential test Dm of the type D 0 as follows: Let 
a = a x j ••• , a d , y x , ••• , yt y 



330 


A. WALD AND J. WOLFOWITZ 


be any sequence such that n(a, A) = d + t. Then for the sequence 

00 = Ooi , ■ • • , UOJfc , 2/l f * * * » Vt y 

let n(a 0 , Dio) = k + L The decision function associated with D i0 will be 
partially defined as follows: 

^o(Uo) = ¥>(a). 

(The reader will observe that it may happen that ^o(uo) ^ #>(a 0 )). Since r d (a ) = 
r*(a 0 ) it follows that 

1(a) - E(L„ | a, Dj) = l(a .) - £(L„ | a«, D w ) > > Kao), 

in violation of the definition of Kao). A similar contradiction is obtained if 
v(a) < v(oq). Hence v(a) = Kao) as was stated above. 

We define K to consist of all numbers b which are such that there exist points 
a with r d (a) = fc, and for which 7 (b) < 0 . We shall now prove that the test S* 
defined in the statement of the lemma is such that R(S*) is a minimum. Recall 
that the average risk is the expected value of L n . Let S be any other test. 
Let a* (ai , • • • , a d *) be any sequence such that either n(a*, $*) = d*, or 
w(a*, S) = d*, but n(a*, S*) ^ n(a*, S). We exclude the trivial case that the 
probability of the occurrence of such a sequence, under both H 0 and Hi , is zero. 
Let r d *(a*) = b*. The sequence a* may be one of three types: 

1 ) 7 (b*) < 0. Hence b* e K , n(a*, S) > d*. It is more advantageous, from 
the point of view of diminishing the average risk, to terminate the sequential 
process at once, since E(L n | a*, S) > l(a*). 

2 ) 7(6*) = 0 . Hence b* e 7C, n(a*, S) > d*. If l(a*) - E(L n | a*, S ) = 0 , 
i.e., the supremum is actually attained by S , then, as far as the average risk is 
concerned, it makes no difference whether the sequential process is terminated 
with a* or continued according to S. If, however, l(a*) — E(L n | a*, S) < 0 , 
it is clearly disadvantageous to proceed according to S. It is impossible that 
Z(a*) — E(L n | a*, S) > 0 , since 7 ( 6 *) = 0 . 

3) 7 ( 6 *) > 0. Hence b* 4 K , n(a*, S ) = d*. Clearly it is more advantageous 
from the point of view of diminishing the average risk not to terminate the 
sequential process, but to continue with at least one more observation. After 
one more observation we are either in case 1 or 2 , where it is advantageous to 
terminate the sequential process, or again in case 3, where it is advantageous to 
take yet another observation. 

We conclude that R(S*) is a minimum, as was to be proved. 

4. A fundamental lemma. Consider the complement of K with respect to 
the non-negative half-line, and from it delete all points b f for which there exists 
no point a in some d-dimensional Euclidean space such that r d (a) = 6 '. The 
point 1 is never to be considered as of the type of 6 ', i.e., 1 is never to be deleted. 
Designate the resulting set by R. 



SEQUENTIAL PROBABILITY RATIO TEST 


331 


Our proof of the theorem to which this paper is devoted hinges on the follow¬ 
ing lemma: 

Lemma 2. Let W, g, c be fixed , and A be as defined above. There exist two post - 

Wn(Jo 

live numbers A and B , with B < < A , such that 

W i Qi 

a) ifbe A, then cither b > A or b < B 

b) if b c K, B < b < A. 

Two remarks may be made before proceeding with the proof: 

1) We may now complete the definition of <p for tests of the type of S*. The 

reader will recall that v was not uniquely defined when X = 1, i .e., when r n = • 

Wxgi 

Lemma 2 shows that it is necessary to define <p(\) only when X = e A and 

Wigi 


X is therefore either A or B. We will define 


(Wogo\ 

\WigJ 


as 0 or 1, according as 


Wo go 

Wigi 

is A or B, and A 9 * B. This is simply a convenient definition which will give 

TT o 

uniqueness. When A = B = € A, the situation is completely trivial, and 

W\gi 

we may take v = 0 arbitrarily. 

2) If 1 € A the above lemma shows that the average risk is minimized (for 
fixed TF, g , c, of course) by taking no observations at all. We have <p = 0 or 1 
according as 1 > A or 1 < B. 


Proof of the lemma: Let h > 
Wo go 


Wo go 
W igi 


be a point in K. We will prove that any 


point h ' such that -- < h' < h, and such that there exists a point a' in some 

Wigi 

d'-dimensional Euclidean space for which r d ’(a') = h is also in K. In a similar 

way it can be shown that, if ho < is any point in A, any point ho such that 

Wigi 

t Wo Qo 1 

ho < ho < , and such that there exists a point a Q in some (/"-dimensional 

Wigi 

Euclidean space for which r d "(ao) = ho, is also in A. This will prove the 
lemma. 

Let therefore h and h! be as above. Let S* be the sequential test based on 
A, with the decision function <p. Let a be a point in d-space such that r d (a) = h. 
Since h e K we have y(h) > 0. 

We now wish to define partially another sequential test S , with a decision 
function which may be different from <p, as follows: Let a' be defined as above. 
Write 

a = (a x f • • • , a d ) 
a' = (a[ y • • • , aiO- 
Let 

(z == cti, , **j a d , 2/1, • , yt 



332 


A. WALD AND J. WOLFOWITZ 


be any sequence such that n(d , S*) = d + t. Then for the sequence 
d fli j 1 " i ) 2/1 > * > yt 

let n(a' y 8) = d' + l. The decision function \f/ associated with 8 will be partially 
defined as follows: 


yp(d') = <p(a). 

Clearly 

(4.1) E l (n \a y S*) - d = E l (n | a\ 8 ) - d! (i = 0, 1) 

and 

(4.2) E,(<P I «, S*) = J5,(* | a', 5) (» = 0, 1). 

Furthermore, we have 

1(a) - E(L n | a, S*) 


(4.3) 


= - r ! Wo + cd- cEo(n | a, S*) - W 0 [ 1 - £o(* I a, S*)]) 
f/o + g i h 


+ - i-- fc fed - e£,(» | a, S*) - W t Ei(p | a, S*)). 

go + <7i h 

Since y(h) > 0, and since 

(4.4) cd - cEAn | a, S*) - WiEfo | a, S*) < 0, 
we must have 

(4.5) IFo + cd - cEo(n j a, S*) - W 0 [l - E 0 (<p ] a, S*)] > 0. 

From h f < h it follows that 

/ 4 m go . <7o , ^ gift _ 

* b; go + gih' > go + gih ’ ana g 0 + gih' < go + M * 
Relations (4.1), (4.2), (4.4), (4.5) and (4.G) imply that the value of the right hand 
member of (4.3) is increased by replacing <p , h, a , S* and d by h ', a', 5, and 
d', respectively. This proves our lemma. 

If there are values which r, cannot assume the pair B y A might not be unique. 
For convenience we shall define A and B uniquely in the manner described below. 
We will always adhere to this definition thereafter. 

We shall first define y(h) for all positive h in a manner consistent with the 
previous definition, which defined y(h) only for those values of h which could be 
assumed by r,. Let h be any positive number and D(h) be any sequential test 
with the following properties: 


(4.7) 


there exists a set Q(h) of positive numbers such that n = j 
if and only if the j-th member of the sequence 

hri , hr 2 , hr *, • • • 



SEQUENTIAL PROBABILITY RATIO TEST 


333 


is the first element of the sequence to be in Q(h) 

(4.8) E t (n | D(h)) < « (» = 0, 1). 

We define, for h > , 

Wigi 

(4.9) y(h | D(h)) - {W,E a ( v | D(h)) - cE 0 (.n \ D (h))} 

go + gih 

+ {-W,E^ | D(h )) - cE 1 (n | D(h))\, 

go ■+■ gin 

(4.10) y(h) = sup y{h | D(h)) 

D(h) 


with a corresponding definition for h < . Thus y(h) is defined for all posi- 

*y 1 9 1 

tive h. This definition coincides with the previous definition whenever the latter 
is applicable. It is true that the supremum operation in (4.10) is limited to 
tests which depend only on the probability ratio, as (4.7) implies, but the argu¬ 
ment of Lemma 1 shows that this limitation does not diminish the supremum. 

(It might appear that, for h = , y(h) is not uniquelv defined. We shall 

n \9i 

shortly see that this is not the case.) 

The quantity y(h) depends, of course, on go and f/i. To put this in evidence, 
we shall also write y (h, g 0 , gi). One can easily verify that 


yOh go , gi) 


y(l _£?_ mi- ) 

\ ’ go + gi h ’ go + gi h) 


More generally, for any positive values h and h', we have y(h, g 0 , gi ) = 
y(h', go , §i), where go and gi are suitable functions of go , gi , h, and h'. Thus, if h 
is not an admissible value of the probability ratio and h! is any admissible value, 
we can interpret the value of y(ft, go , </i) as the value of y corresponding to h' and 
some properly chosen a priori probabilities g 0 and gi . 


We now define A as the greatest lower bound of all points h > for which 

y(J\) < 0. We define B as the least upper bound of all points h < for which 

" \Qi 

y(h) < 0. If y(h) < 0 for all h the above definition implies A = B = JI r — . 

11 i<7i 

The argument of Lemma 2 shows that yili) is monotonically increasing in the 

interval ( B, ) , and that y(h) is monotonically decreasing in the inter- 
\ Lk i <7i/ 

We shall now define a sequential test S*(h) for every positive h. The decision 



334 


A. WALD AND J. WOLFOWITZ 


function of S*(h) will be <p, and n — j if and only if the j -th member 
of the sequence 

y(hri), 7 (^ 2 ), y(hr z ), ••• 
is the first element to be < 0 . We see that 
(4.11) y(h) - y(h | S*(h)) 

for all h. Incidentally, this proves that y(h) was uniquely defined at 

Wigi 

We shall now prove 

Lemma 3. The function y(h) has the following properties: 


a) It is continuous for all h . 

h) y(A) = y(B) = 0 

c) y(h) < 0 for h > A or < B. 

Only a) and c) require proof, since b) is a trivial consequence of a) and the 
definition of A and B. 

Let h be any point except , and let z be any point in a neighborhood of h. 
Wxgi 

Within a neighborhood of h both E 0 (n | S*(z )) and Ei(n | S*(z)) are bounded. 
Let A be an arbitrarily given, positive number. Let V and h " be any two points 
in a sufficiently small neighborhood of h, to be described shortly. We proceed 
as in the argument of Lemma 2 , with the present hf corresponding to h of Lemma 
2 , the present h" corresponding to h' of Lemma 2, and with S*(h') corresponding 

to S* of Lemma 2 . Since 


go 


and —^—— are continuous functions of z, 
go + g is go + g\z 

and since E 0 (n | ^S*(z)) and E x {n | S*(z )) are bounded functions of z, we con¬ 
clude that, when the neighborhood of h is sufficiently small, 

y(h") > y (AO - a. 

Reversing the roles of h ' and h n we obtain that in this neighborhood 

7(^0 > y(h") - A, 

and conclude that 

I y(h f ) — y(h") | < A. 

Since A was arbitrary, this implies the continuity of 7 (h) everywhere, except 

perhaps at h = . 

Wig 1 


To deal with the point h — 


Wo g 0 

Wigi 


, proceed as follows: Using the above argu¬ 


ment and the definition (4.9), (4.10), we prove that 7 (h) is continuous on the right 



SEQUENTIAL PROBABILITY RATIO TEST 


335 


at h = 


Wo g 0 


h < 


W U 71 ‘ 

Wo£o. 

W l9l ’ 


Using, at the point ft = , the definition of y (ft | -D(ft)) for 


(4.12) 


7 (ft | Z>(ft)) = " * I *>&)) " cEo(n | D(ft))) 

00 T" 01 /* 

+ ——i (TUx^xd - «> |D(ft)) - cft(»|D0k))}, 

0o *T Q\n 


(4.10) and (4.11), we prove that y(h) is continuous on the left at h = . 

W\ gi 

This proves a). 

To prove c), we proceed as follows: Suppose for ho > A we had y(ho) — 0. 
Since 


{ -WiEib | S*(ho)) - cEi(n | S*(ho))} < 0 , 
we would have that 


{WoEo(<p | «*(«) - cE 0 (n | S*(ho))} > 0 . 

WnQn 

An argument like that of Lemma 2 would then show that y(h) > 0 for =.• Tr -~ < 

W Y gi 

li < ho . This, however, is impossible, because it is a violation of the definition 
of A. 

In a similar way we prove that if h < B,y(h) < 0 . This proves c) and with 
it the lemma. 


5. The behavior of A and B . Lemma 4. Let g and c be fixed. Then A and 
B arc continuous functions of Wo and Wi . 

Proof: It will be sufficient to prove that A is continuous, the proof for B 
being similar. Suppose A > B. Let hi and /i 2 be such that 

a) B < hi < A < h 2 ; 

b) h 2 — hi < A for an arbitrary positive A. 

We write y(h) temporarily as y(h , Wo , Wi) in order to exhibit the dependence 
on Wo and Wi . Then 

y(h , Wo , IFi) > 0; 

7 (h 2 , Wo , Wi) < 0. 

It follows from (4.9) that y(h\D(h)) is continuous in TF 0 , Wi , uniformly in 
D{h). Hence y(h> Wo , TFi) = sup y(h | D(h)) is also continuous in Wo , Wi . 

D(h) 

Hence, for AWo and AWi sufficiently small, 

y(h x , Wo + AWo , W! + AWi) > 0; 
y(h 2 , Wo + AWo , Wi + AWi) < 0. 



336 


A. WALD AND J. WOLFOWITZ 


Therefore 

h < A(W 0 + AWo , Wi + AWi) < h , 

which proves continuity, since A was arbitrary. 

If = A = B, we take hi < < h* , h* — hi < A, and by a similar 

Wigi vV 1 Q 1 

argument show that 

y(h x , Wo + AWo , Wi + AWi) < 0 ; 
y(h , Wo + AWo , Wi + AWi) < 0 . 

Thus 


hi < B(Wo + A Wo , Wi + AWi) < A(W 0 + AW 0 , Wi + AWi) < h 2 . 


This proves the lemma. 

Lemma 5. Let g, c, and Wi be fixed. A is strictly monotonic in W 0 • As Wo 
approaches 0, A approaches 0; as W 0 approaches + oo, A also approaches + 

Proof: Since A > , A —> + qo as Wo —> + ». If Wo < c no reduc- 

WiQi 

tion in average risk could compensate for taking even a single observation, no 
matter what the value of h. Hence y(h) < 0 for all h when W 0 < c, so that 

A — B. Since B < J*!?, B -* 0 as W 0 -> 0. Hence .4 -*• 0 as Wo 0. 
W i g i 

It is evident from (4.9) that y(h | D{h)) is non-decreasing with increasing Wo 
(everything else fixed). Hence also 

y(h) = sup y(h \ D(h)), 

D(h) 


is non-decreasing with increasing TF 0 , for fixed h > and fixed ITi. For a 

W 1 Q\ 

positive A sufficiently small and for any h such that A < h < A + A, we have 
that 


E 0 (if | S*(h)) > 0. 

Hence, for such h, y(h , W 0 , Wi) is strictly monotonically increasing with increas¬ 
ing Wo . Therefore A is (strictly) monotonically increasing with increasing Wo . 

We now define the function Wo{W\ , 5) of the two positive arguments Wi , 
8 so that 


A(Wo(Wi,6), Wi) - 8 . 

By Lemma 5 such a function exists and is single-valued. 


6. Properties of the function Wo(Wi , 8). Lemma 6. Wo{Wi , 8) is con¬ 
tinuous in Wi . 

Proof: Let 

lim Wis = Wi, 



SEQUENTIAL PROBABILITY RATIO TEST 


337 


and suppose that the sequence {Wo(W\n , 5)} did not converge. Suppose Wo 
and Wo were two distinct limit points of this sequence. From the continuity 
of A (Lemma 4) it would follow that 

A (Wo , W\) = A (Wo , W\) 

This, however, violates Lemma 5. The only remaining possibility to be con¬ 
sidered is that 

lim Wo(Wi N , 5) = oo. 


If that were the case, then, since A > , it would follow that A —► oo, 

W i g i 

in violation of the fact that A = 8. 

Lemma 7. We have , for fixed 8, 


lim W 0 (Wi) = 0; 

^!-*0 

lim IFoOFi) = oo. 

W 1 —.00 

Proof: If, for small Wi , W 0 (Wi) were bounded below by a positive number, 
QoWo(W\ 5 ) 

then, since A > -— -—, we could make A arbitrarily large by taking W\ 

rVl gi 

sufficiently small, in violation of the fact that A = 8. To prove the second half 
of the lemma, assume that Wo(Wf) is bounded above as Wi —> oo. Then 


a PP roac ^ zero as Wi 


B ( < ) will approach zero as Wi —> ». Let h be fixed so that B < h < 8. 

Consider the totality of points for which there exists an integer n*(co) such that: 


hr n . < B ; 

B < hrj <5, j < n*. 

The conditional expected value of n* in this totality, when H 0 is true, may be 
made arbitrarily large by making B sufficiently small. Hence, when Wi is 
sufficiently large, for fixed but arbitrary h < 8, the optimum procedure from the 
point of minimizing the average risk is to reject Ho at once without taking any 
more observations. This, however, contradicts the fact that h < 8, and proves 
the lemma. 

Lemma 8. We have, for fixed 8 > 1, 

lim B(Wo(Wi,8),Wi) = 5; 

W j—»0 

lim B(Wo(Wi , 8) } Wi) = 0. 
w i-»oo 


lim Wo(Wi) = 0. 


Proof: By Lemma 7, 



338 


A. WALD AND J. WOLFOWITZ 


When, for fixed c, both Wo and W\ are small enough, then, no matter what the 
value of h, y(h) < 0. Hence A = B f which proves the first half of the lemma. 

Let now {Win} be a sequence such that lim Win = 00 . Let 8 > 1. For the 
sake of brevity we write 5(TFiy) instead of 

B(Wo(WinS), Win). 

Suppose that, for sufficiently large N, B(Win) is bounded below by a positive 
number. Hence, for sufficiently large iV, the probability of rejecting Hi when 
it it is true is bounded below by a positive number. Moreover, since 

B < < A , it follows that, for N sufficiently large, 0 is bounded above 

W\gi WinQi 

and below by positive constants. Thus, for large N the average risk of the test 
defined by B(Win), 5, is greater than ugiWiN , where u is a positive constant 
which does not depend on N. Moreover, from the definition of B(Win), this 
risk is a minimum. 

Let e be a positive number such that + 1^ < ^ for all N sufficiently 

large. Let V \, V 2 , with 0 < Vi < 1 < V 2 , be two constants such that, for the 
sequential probability ratio test determined by them, both a 0 and «i are < €. 
Of course E 0 n and Em are finite and determined by the test. For this test the 
average risk is less than 

e(ffo Won + Qi Wi*) + ego Eq n + cgi Em 
< g fh Wur + egoE 0 n + cgiEin 

^ TIT 

<-j9i Win > 

for Win large enough. This however contradicts the fact that the minimum 
risk is > ugiW\ N , and proves the lemma. 

7. Proof of the theorem. Let a given sequential probability ratio test So be 
defined by B*, A*; B* < 1 < ^1*. Let <^($ 0 ) be the probability, according to 
&o, of rejecting H t when it is true. Let c be fixed. 

By Lemma 4, B is a continuous function of Wo and Wi . Let 8 = A* in 
Lemma 8. Then there exists a pair W 0 , Wi , with W 0 = Wo(Wi , A*), such that 

A(W 0 , Wi) = A*; 

B(W 0 , Wi) = B*. 

Hence the average risk 

£<7,-[WWSo) +c£ftn)], 

l 

corresponding to the sequential test So is a minimum. 



SEQUENTIAL PROBABILITY RATIO TEST 


339 


Now let Si be any other test for deciding between H 0 and Hi and such that 
di(Si) < cti(So), and E\(ri) exists (i = 1, 2). 

Then 

]C Q% \Wi cti(S o) + cE°i(n)] < 9t [Wi cti(Si) + cE\(n)]. 

i i 

Since <%,■(&) < ai(So), we have 

ZffiE°<(n) < Z g< E\(n). 

i % 

Now g 0 , gi were arbitrarily chosen (subject, of course, to the obvious restric¬ 
tions). Hence it must be that 

E\(n) < E\(n). 

This, however, is the desired result. 

REFERENCES 

1] A. Wald, “Sequential tests of statistical hypotheses”, Annals of Math. Stat. f Vol. 16 
(1945), pp. 117-186. 

[2] A. Wald, Sequential Analysis, John Wiley and Sons, Inc., New York, 1947. 

[3] Charles Stein, “A note on cumulative sums”, Annals of Math. Stat., Vol. 17 (1946), 

pp. 498-499. 



LIMITING DISTRIBUTION OF A ROOT OF A DETERMINANTAL 

EQUATION 

By D. N. Nanda 

Institute of Statistics , University of North Carolina 

1. Summary. The exact distribution of a root of a determinantal equation 
when the roots are arranged in a monotonic order was obtained by S. N. Roy 
[3] in 1943. A different method for deriving the distribution of any one of these 
roots has been described by the author in [2]. In the present paper the limit¬ 
ing forms of these distributions are obtained. This paper gives a method by 
which the limiting distributions can be obtained without undergoing an inordi¬ 
nate amount of mathematical labor. 


2 . Introduction. If x — || || and x* = || x*j || are two p-variate sample 

matrices with ni and th degrees of freedom and S = 11 xx' | |/ni and S* = 11 £*£*' 11 
/na are the covariance matrices which under the null hypothesis are independent 
estimates of the same population covariance matrix, then the joint distribution 
of the roots of the determinantal equation | A — 0(A + B) | = 0, where A = 
niS and B = n 2 S*, was obtained by Hsu [1] in 1939 and is 


(1) R'(l,p,v) = — 


ri (6<r 12 - 1 rid - op- 1 n & - ej), 


(o<e,< e,-i < • • • < 6i < ,), 

where l = min (p, ni), y. = | p — n, \ + 1 and v — n» — p + 1. The distribu¬ 
tion density may be expressed as 

(2) R(l, m, n) = c(l, m, n) II [0?d - ^.) B ] II (fi, - 6j), 

•-i »<# 


where m = /x/2 — 1 and n = v/2 — 1. 


3. Method. Let 6% — U/n in (2). The joint distribution reduces to 

(3) i+iln+ili-i)/2 IT [f?(l — f</n) n ] II (f * ~ J*y) dfi • • • dU , 

n t „i {<j 


340 


(0 < fj < fz-i ••• < fi <n). 



LIMITING DISTRIBUTION 


341 


As n tends to infinity the limit of (3) is 


K(i, m) nr? n (r, - ii)e ~ lu dr,-. 

<-i *<; 


(o < r< < r«-i • • • < rx < »). 


The value of K(l , m) is 
c(Z, m, n) 

lim l+lm+l(l—l) 12 


II r (' 


ri r r - 1 ) r(t/2)-n' 

II r (L±i- 

7T lilYl taa l \ 

nr^L±i±J)r(i/ 2 ) n r ( 2w ±^+- 


Z + 2m + 2n + i + 2 } 
2 ) 


Et r ( s 


l+lm+l(l—\) 12 


, ^Z 4- 2m + 2 ai + i + 2^ 
2n + i + l\ z+jm+zu-u/2 


By using Stirling’s approximation for gamma functions and after simplifica¬ 
tion we get 

jt /Z + 2m + 2n + i + 2\ 

lim ft ■ )— »-A = 1. 

n -*°° jj p / 2w + i + 1\ ^m+iu+D/2 


?J r C +J ” 


n ~" n r (- 




Hence 


K{1 , rw) = 


_7T^_ 

?j r ( 2 -- + 2 ~ +J ) F(l/ 2) ’ 


and therefore 

A(2, m) = 2 2m+l /T(2m + 2), 

A'(3, m) = 2 2m+3 /[r(r» + l)r( 2 w + 3)], 

(0) K( 4, w) = 2 4 m+ 7 [r( 2 m + 2)r(2m + 4)], 

A(5, w) = 2 ,m+ 7 [ 3 r(rn + l)r(2w + 3)r(2m + 5)]. 

Let 


(6) G,. m (x) = K(l, m) [ 

Jo 


It can easily be observed that 


i-i *<? 


= Pr (fi < x) = lim Pr (n0i < rr) = lim Pr (61 < £). 

n-*oe n -»oo \ ”/ 



342 


D. N. NANDA 


Thus the limiting form of the distribution of the largest root can be obtained by 
integrating the density given in (4) according to the method described by the 
author in [2]. It is, however, observed that the mathematical labor is reduced 
considerably by adopting the following method. 

Referring to the results of the exact distribution of the largest root given in 
[2], let Fi ttn , n (x) = (0, l, l - 1, • • • , 1, x; m, n); thus F 2 , m , n (x) = (0, 2,1, x; m, n) 
and Fz, m ,n(x) = (0, 3, 2, 1, x \ m, n). Then c(l , m, ri)Fi, mtn (x) is the probability 
that none of the roots 0* exceeds x , and is thus the cumulative distribution func¬ 
tion of the greatest root. We shall show that lim c(l , m, n)Fi >min (x/ri) = 

»—.00 

Gi,m(x). The reader is, however, asked to refer to [2] for the detailed explana¬ 
tion of the notations and certain mathematical operations used in this paper. 

4. Limiting distribution of the largest root. We will derive the distribution 
of the largest root for l = 2 and 3 by the two methods. A straightforward 
method will be named A. A second method, which proves to be very simple 
and easy will be called Method B. 

(a) l = 2 

(i) Method A. We have, 

Pr (nft < x) = G 2 , m (x) = K( 2, m) f - f 2 )e _(fl+f!) dfi df 2 

By using the method described in [2], we have 

g 2 .Jx) - k( 2, m)l[ - f f? e ~ ri -rr + y- n #,), 

LJo<r 2 <r,<x Jo<ri<ri<. J 

= K(2,m){TZ ,t (y, l,x;m + 1) — TT’^O, 1, j/; m + 1)J, 


where 


T?’ l g(y ) = C g{y)-y m e~ v dy, 

and 

(7)(a, l,6;w+ 1) = f f m+1 <T r df = (o m+l e“ a - b”*^) + (m + 1 )(a, 1,6; m). 

Ja 

Hence, 

G 2 , m (x) = J2T(2, »)TTV H V’' - x n+1 e“ I + (m + 1 ){y, 1, x; m) + y m+ \~ v 

- (m + 1)(0,1, y; m)], 

= K(2, m)n'[2y m+i e- v - /V], 
as To^Ky, 1. *5 m) - (0,1, y; m )] = 0. 



LIMITING DISTRIBUTION 


343 


Therefore 


lim Pr (n6i < x) = G t , m (x) = K(2, to) 


2 jf y tm+1 e~ ty dy - x^'e" 1 y m e~ y dy]. 


When x = «>, (r 2 , m (a:) = 1; hence K( 2, m) = 2 2m+1 /r(2m + 2), the valuegiven 
in (5). 

Now we shall derive the result by Method B. 

(ii) Method B. 


F 2 , n , n (x) = (0, 2, 1, x; m, n) = 


m + n + 2 


2 _(V m+1 (l - y) 2n+1 dy - z m+1 (l - x) n+1 /* y m (l - y) n dy\, 


a result given in [2]. 

Replacing x by x/n, we get 


(0, 2, 1, x/n] m, n) = 


m + n + 2 


f*/n -x/n ^ 

2 1 y2m+1(1 ” 2/)2 ” +1 ^ _ fe/*r +1 a - *A) n+1 1 y^ 1 - yv d y\ ; 


also, letting y — u/n , we have 


(0, 2,1, x/n ; m, n) = 


(m + n + 2)n 2 ’ 


|2 jf M 2m+, (1 - V«) 2n+1 rftt - x m+1 (l - x/n) n+1 j* w m (l - u/n) n duj . 


lim Pr (ndi < x) = Pr (0i < x/n) = lim c(2, m, n)(0, 2,1, x/n; to, n), 


T(2to + 2) 1, Jo 

which is the same as (8), obtained by Method A. 
(b) l = 3. 

(i) Method A. We have 


2 / n 2m+1 e~ 2 “ d« - aTV 


J M m e~“ duj*, 


nf?n(f, - fy)e" Sf, n d?< 


Pr(n0i < x) = (?8,m(x) = A(3, to) f ntflKf, - fy)e" Sf, n df, 

= K(3, to) f (fi f 2 f 3 ) m e- (ri+f2+f,) 1 1 , 2, 3} df t df* df 3 , 

Jo <r.<r a <fi<i 

where {1, 2, 3} = W*{1, 2} + fjM3, 1) + 3), as given in [2]. 



344 


D. N. NANDA 


= K(3, to) | J 


+ / 

; o<ri<ra<ri<* *'o<f 1 <ri<f 2 <* 

+ f f 2 ) m+I e- <fl+r ‘ ) {1,2} dft dft dftl, 


-£(3,m)f2T' a (y,a i l,*;m+ 1 ) 

+ TT-fO, 2, y, 1, x; to + 1) + TT’^O, 2,1, y; m + D}, 

where 

(a, 2,1, b;m) = [ (ft ft) m (ft-ft)e" <n+fi) dft dft. 

•'o<ra<fi<& 

We have already obtained 

(0, 2,1, x; to) - G t , m (x)/K(2, to) = {2 jf y 2 " ,+, e“ 2 *' dy - x m+1 e^ jf y m e~ v dyj 
as given in (8). 

We also need the following results which are obtained by the method de¬ 
scribed for l = 2. 

(10) (a, 2,1,6; to) = |2 jf u^e -2 ” du - (a”• +1 e~ ,, + b~*V*) jf M m e““ duj , 


(11) (a, 2, 6, 1, c; to) = |r +1 e -6 £ du - a m+1 e~ a J' uV“ du 

- c mH e~ e f* u m e~ u du|. 


Using these results we have 


G,. m (x) = K(3, to)T?**| 2 £* u 2m+ V 2u du - (y m+2 e~' / + x m+2 <T*) £> +1 e~“ du 

- y m+1 e~ v f u m+1 e _u du + x m+s e“* f u m+I e"“ du + 2 f u 2m+3 e~ 2u du 
Jo Jo Jo 


- y m+2 e~ v r U n+t e~ u du 
Jo 


Simplifying we get 


(12) lim Pr(n0i < x) = G,. m (x) = K(3, to) i 2 f u 2m+3 e~ 2u du / u m e"“ du 

n-*oo ^ •'0 Jo 

- 2 f u m+1 e~ u du f u 2m+ V 2u du - x m+2 e~ x 
Jo Jo 


[2 jf u 2n+1 e~ 2u 

where Jf(3, to) = 2 2m+ 7[r(2rn + l)r(2m + 3)]. 


; 2m+1 e" 2u du - x m+ y 


j[ ”<*“]}’ 



LIMITING DISTRIBUTION 


345 


(ii) Method B. 

Fi,m.n(x) = (0, 3, 2, 1, *; m, n) 

= ' I ^ _l q [2(0, 1, x; 2m + 3, 2n + 1)(0, 1, x; m, n) 

Wl T W T O 

—2(0,1, x; m + 1, n)(0, 1, x; 2m + 2, 2n + 1) 

- (0, x; m + 2, n + 1)(0, 2, 1, x; m, n)1 


a result given in [2]. 

Replacing x by x/n and putting u/n for the variate y of integration, we have, 

1 


= (0, 3, 2, 1; x/n; n?, n) 


m + n + 3 


{yL j( w2m+J(1 “ w / n)S, ‘ H f/w j[ ~ «/»)" dM “pU 


x m4, a -*/»)• 


Hence 


f «”* +, (l - «/»)* du f u 2m+2 (l - u/nf „„ Jm+4 . , .. 

Jo Jo n 8m+4 (ra + n + 2) 

M Smt, (l - M/») 2n+1 du - X m+, (1 - z/n)" +1 u"(l - w/«) n du]J. 
lim Pr ( nBi < x) = lim Pr ( Oi < -j = lim c(3, m, n)F 8 ,m,„(x) 

n-»oo n-* oo \ n/ n—»oo 

= AT (3, m) 1.2 f u 2m+z c~*“ du fVc u du - 2 f u 2m+2 e~ 2u du [ u m+ \ +u du 

( Jo Jo Jo Jo 

- x mi 2 e~* j^2 jf u 2m+1 <T 2 “ (/« - a:’" H e~* j* uV , 

K( 3, to) = 2 2m+3 /[r(TO + l)r(2TO + 3)]. 


where 


This result is the same as (12) obtained by Method A. 

We have thus shown that Method B is applicable for obtaining the limiting 
forms of the distribution of the largest root and that it is much simpler as com¬ 
pared to the straightforward method called Method A here. 

The limiting distributions for the largest root for l = 4 and 5 are listed below, 
(c) l = 4. 

lim Pr (ndi < x) = lim Pr (0i < -) = Gi, m (x) 

n—»oo »-*oe \ n/ 

- ^ "> { 2 jf ”“* v “ gtfs; - 2 jf *• 

[2 f *. - £ »-«-• *, + <» + 2 ) ] 



2m+3 - 2u 

w C 


du 


A'(2, m + 1) 


— x 


K(3~mjf’ 



346 


D. N. NANDA 


where 

K( 4, m) = 2 4 " + ‘/[r(2wt + 2)r(2m + 4)]. 

(d) l = 5. 

lim Pr (riff i < x) = lim Pr (ffi < -) = 

»-»oo n —>00 \ UJ 

= (5, m) |2 jf 


where 


uim+ , - 2 U du G S , m (x) 


- 2 [u 2m+e e~ iu du 
Jo 


k( 3, m) 

T 2 fn 2m+ V 2u dn f X u m e- U du - 2 f n 2m+ V 2tt dn 

|_ Jo Jo Jo 

f m+1 —« 1 m+3 -ar ^2, m(^) . / , q\ ^3, m(^) | 

•i. M e dM " x e xm + ( * + 3) kmJ 

+ 2 j[ V m+6 e _2 “ 12 J\ 2m H e~ 2u du J*u m e~ u du 

- 2 (V ra+ V 2u rfii [*u" +2 e~ u du - x m+ V* 

Jo Jo 

- t-e-fre-’*, + <» + 2) ® 2 -«]} 

- 2 (/>< - .T mH4 c 1 , 

A(3, m + 1) Jo A (4, m)J 

A(5, m) = 2 4m+9 /[3r(m + l)r(2m + 3)r(2m + 5)]. 


5. Limiting distribution of the smallest root. It was shown in [2] that the 
exact distribution of the smallest root can be obtained by using the relation 

Pr (0j < x) = 1 — Pr (0i < 1 — x | v, ju). 

This relation, however, does not help in obtaining the limiting distribution of 
the smallest root from that of the largest root. The limiting distribution of the 
smallest root can be obtained by the method illustrated below. 

(a) l = 2. 

The exact distribution of the smallest root 0 2 can be expressed as 

Pr (0 2 < x) = c(2, m, n){(0, 2, 1, x; m, n) + (0, 2, x, I, z; m, n)}, 
where z = 1. Replacing x by x/n, we get 

Pr (0* < x/n) = c(2,m,n){(0,2,1 ,x/n;m,n) + (0,2,x/n, l,z;m,n)}, 
where 

( 0 , 2 , 1 , x/n-, m, n) = - ■ t 4 . o \ 2 f y 2m+1 (i - y) 2n+1 dy 
in + n + Z L jo 

x/n ”| 

2/ m (l -y) n dy , 


— (0, x/n; m+1 


, n + 1) f 
Jo 



LIMITING DISTRIBUTION 


347 


and 

(0, 2, x/n, 1, a; m, n ) = * | (0, x/n ; m + 1, » + 1) 

m + n -r J L 

• 2/ m (l - 2/)" </y — (0,«;» + l,» + l)j[ iT(l — 3/)* tfyj, 

as obtained from (6) of [2]. 

The limiting distribution of 0 2 is 

(13) Pr (02 < x/n) = lim c(2, m 9 n){(0, 2, l, x/n; m, n) + (0,2, x/n, 1 , 2 ; m, n)}. 

Putting n/n for t/, the variate of integration and allowing n to tend to infinity, 
we have 

lim c(2, m , n)(0, 2, 1, x/n; n?, n) 

n-+co 

= K(2, m) 12 jf V mH 1 e -2u du - x m+1 e~ x jf/u m e~ u du|, 

and 

lim c(2, rw, n)(0, 2, x/n, 1, ?; m, n) = if (2, m)x m+1 e“* [ u m e~ u du 

n-»oo </o 

= K(2, m)x m+1 e~*r(wi + 1). 

Substituting these results in (13) we have 
lim Pr (n0 2 < x) = lim Pr (0 2 < x/n) 

n~*cc n-» oo 

= tf(2, m) 12 jju m+1 e~ 2u du - x m+1 c~ x jfVe““ da 

+ x mt V x r(/« + l)j, 

where 

K(2, m) = 2 2wt+1 /[r(2m + 2)]. 

(b) l = 3. 

The exact distribution of the smallest root can be expressed as 
Pr (0 3 < x) = c(3, m, n)[(0, 3, 2, 1, x; m, n) + (0, 3, 2, x, 1, 2 ; m, n) 

+ (0, 3, x, 2, 1, 2 ; m, n)], 

where 2 = 1. 

Replacing x by x/n and allowing n to tend to infinity we have 
Pr (n0 8 < x) = lim c(3, m, n)[(0, 3, 2, 1, x/n; m, n) 

n—*00 

+ (0, 3, 2, x/n, 1, a; m, n) + (0, 3, x/n, 2,1, a; m, n)]. 


(14) 



348 


D. N. NANDA 


The values of these components on the right hand side of the above equation 
are given below. 

lim c(3, m, n)(0, 3, 2,1, x/n ; m, n) = G 8 ,m(a0, given by (12), 

n-*oo 

lim c(3, m, n)(0, 3, 2, s/n, 1, 2; m, n) 

n-*ao 

= K( 3, m) jjfV e- du ^2 J\ 2m+l e~ 2u du 


- x M+1 c _I / u m+I e““ du I - *- t V 


jfVe““ du] 


+ x m+ V 


c-^2 jfu 2 ”‘ + V 2 “du 
£V +I <T U du J*u m e~ u du 
2 c ~“j[> +i e ~ 


- 2 L 


u m+ V“du / u 2m+i e~ 2u du 


ttuu 

lim c(3, m, n)(0, 3, x/n, 2, 1, z; m, n) = A'(3, m ) «" " du |^2 J ■ 

- x m+2 <f* £°u m+I e- du] - x M+2 «"■ j^2 J~u 2m+1 e~ 2u du - x m+1 e“* j 


„ 2 m +3 — 2 tt 7 . 

w e cm 


+ x mM e- 1 


u 2m+1 c~ 2u du - x m+1 e~* 


J u m c “ du 


j*u m < 1 du J\ m e~ u du - 2 jfu m H f-“ du f“u m+i e~ 2u du 


Substituting in (14) we have, 

lim Pr (h0j < x) = 12 2m+3 /[r(m + l)r(2w + 3)]) 


.{2 fu 2m+ V 2u du [ X u m e~ u du + 2 fu 2m+ V 2u du fu n e 
Jo Jo Jo Jx 

- 2 f'°u m+1 e~ u du f X u tm+2 e~ 2u du - 2 fu ro+ V“du ru 2m+2 
Jo t Jo Jo J x 

- 2x m+2 e~ z fu 2m+1 e“ 2u du - 2x m1 V* f u 2nU e~ u du 
Jo Jo 

+ X 2m+a e - 2x (jfw"V u du + 


zm 1-2 —z« » 
^ T ffM 


— 2x m+2 c _x u 2m+1 <f 2 “du - 2x m1 V* u 2m "e~ 2u du 


+ x 2m+3 e- 2x [ / u"r-dM + 


lim Pr («0j < x) = 2 2m+, /[r(rn + l)r(2m + 3)] 


J u m e “ du) 


•| r(2 ™J; 4) j[ V <r“ du + 2 j[V m+3 <r 2 “ du JTV«- du 

- 2r(»i + 2) fu 2m+2 e" 2 “du - 2 fu m+I e _u du fu^e^du 
Jo Jo Jx 

- r . ( ^,± 2) x m+2 - x m+2 e~ x jju 2m+1 e~ 2 “ du + r(m + l)x 2m+2 e 

+ x 2m+3 e~ 2x [ X u m e~ u du\. 



LIMITING DISTRIBUTION 


349 


Thus we have seen that this method can be used for obtaining the limiting 
distribution of the smallest root for any value of l. 


6. Limiting distribution of any intermediate root. The above method can 
also be used for obtaining the limiting distribution of any intermediate root. 
We shall give the distribution of 0 2 for l = 3. We have 


(16) Pr (0 2 < x) = c(3,m,n){(0,3, 2,1 , x;ra,n) + (0,3,2,x, 1, 2 ; m, n) j, 


where 2 = 1. 

The lim n -oo c(3, m, n)(0, 3, 2, 1 , x/n; m, n) and lim n -.oo c(3, m, n)( 0, 3, 2, x/n, 
1, 2 ; m, n) are given by (12) and (15) respectively. Substituting these results 
in (16) and simplifying we get 


lim Pr (nO 2 <») = -,- 


i fur 

Jo 


€ M du 


Y(m + \)Y(2m + 3) 

r-2m H e -2u dn _ 2 f u m+l e -u du [*^n+2 g -2u ^ 
Jo Jo Jo 


- 4x m+ V‘ 


fu 2m,t , 

Jo 


c~ 2u du + 2x 2m+ V 2jt u m c” u du 
Jo 


-j-x mh2 e * f u m+1 e * du [ u m e “ du — f u n e u du f u mil e 
_Jx Jq J x Jq 



Or, 

02m+3 f ax 

n lim Pr(nflj < x) = r(m + 1)r(2w + 3) \ 2r(w + 1} J 0 u du 

- 2T(m + 2) f u 2 m ~ l 2 c~ 2u du - 4x m+2 e~ x f u 2 mn e~ 2u du + 2x 2ra+ V 2jr 
Jo Jo 

• J u m du + x n{ 2 e~ x u mn e~ u du J ii n e~ u du 


- j” u m e~ u du J\* u n41 dajj . 


Thus the limiting distribution of any intermediate root can be obtained by the 
above method. 


7. Further problems. The limiting distribution of the largest root is found 
to be very helpful in obtaining the distribution of the sum of roots when m = 0. 
This condition implies that when the results are applied to canonical correlations 
the numbers of variates in the two sets differ by unity. The distributions for 
the sum of roots have been derived under the above condition for l = 2, 3 and 4 
and the results are being presented in the next paper of this series. 

8. Acknowledgements. I am extremely thankful to Drs. P. L. Hsu and 
Harold Hotelling for guidance and help in this research. 



350 


D. N. NANDA 


REFERENCES 

[1] P. L. Hsu, “On the distribution of roots of certain determinantal equations,” Annals of 

Eugenics , Vol. 3 (1939), pp. 250-258. 

[2] D. N. Nanda, “Distribution of a root of a determinantal equation,” Annals of Math. 

Slat ., Vol. 18 (1947). 

[3] S. N. Roy, “The individual sampling distribution of the maximum, the minimum and 

any intermediate of the i p i statistics on the null hypothesis,” Sankhyd , Vol. 8 
(1943). 



ON A SOURCE OF DOWNWARD BIAS IN THE ANALYSIS OF VARIANCE 

AND COVARIANCE 

By William G. Madow 

Institute of Statistics , University of North Carolina 

1. Summary. It is shown that if, in the analysis of variance, the experiments 
are not in a state of statistical control due to variations in the true means, then 
the test will have a downward bias. The power function of the analysis of var¬ 
iance test is obtained when this downward bias is present. 

2. Introduction. To introduce the discussion of this bias let us consider the 
generalized Student’s hypothesis. 

Let 2 / 1 , • • • , ykN be normally and independently distributed with variance 
<r 2 , and let the expected value of y „, be a,,. 1 Then the generalized Student’s 
hypothesis is 

(Null hypothesis) a xv = a 

and the class of alternative hypotheses against which the null hypothesis is 
tested is 

(Class A) a iv = a x . 

From the statement of the null hypothesis and the alternatives of Class A it 
follows that both the null hypothesis and the alternatives of Class A require that 

(1.1) Oi !=•••= a iir . 

Since our experiments are rarely in such perfect statistical control that (1.1) 
holds whether or not the null hypothesis is true, it becomes reasonable to in¬ 
vestigate the existing F test when instead of the alternatives to the null hypoth¬ 
esis being of Class A, they are simply Class B: 

(Class B) Equation (1.1) is false for at least one value of i. 

Furthermore, for many practical purposes we would prefer to test the average 
null hypothesis: 

(Average null hypothesis) d x = d, 

where Nd t = «* + ••• + a iN and kd = di + • • • + d*, instead of the null 
hypothesis, the alternatives to the average null hypothesis being of Class C. 
(Class C) The a»> can have any values such that not all the d, equal a. 

1 Throughout this paper the letter i will assume all integral values from 1 to k , the letters 
/x, v will assume all integral values from 1 to A, the letters y t v will assume all integral values 
from 1 to m, the letter a will assume all integral values from ni 4* • • • + Wt-i + 1 to 
ni -f • • • -f ny , (n 0 - 0), and a\ , a* will assume all integral values from 0 to ». 

351 



352 


WILLIAM G. MADOW 


The F-test of the null hypothesis against the alternatives of Class A is, as is 
well known, 

HN - l) Z(y< - V)* 

F= (fc-il 0* - W) 

i.v 

where Ny t = y t i + • • * + ym and ky = yi + • • • + y k . To answer the ques¬ 
tions formulated above concerning the F-test when the average null hypothesis 
or the alternatives of classes B or C are true, we must then calculate the dis¬ 
tribution of F under these various conditions. This is done in Section 3. 

A somewhat informal means of obtaining the conclusions is that of studying 
F itself. Taking the expected values of the numerator and denominator of F 
and Refining 

, a) 2 

<t>\ = _f_ 

(k - l)<r> 

** = HN - Do- 2 5 (a< ' ~ a<)i 
we obtain as the ratio of the two expected values 

F 1 +*y 

l + <f>2 

It is well known that, in general, the larger the value of N the more closely will F 
approximate F. From this fact it is easy to see why if the null hypothesis is true, 
then F ~ 1, whereas if the null hypothesis is false but an alternative of Class A 
is true then 

F 1 + 0i > 1 


so that large values of F become more likely than if the null hypothesis were true. 
However, if an alternative of Class B is true then 


F r+*j 


1 + 0 *1 

1+02 


so that if 0? < 02 , smaller values of F occur more frequently than indicated 
by the null hypothesis. Thus we would tend to accept the null hypothesis more 
frequently than desired when it is false. Even when the null hypothesis is false 
so that 0 ? > 0, the values of F will tend to be less if 02 > 0 than if 0 ? = 0 
whether or not <t>\ <<f>\. Not only is the probability of an error of the first kind 
less than the value c we may have previously selected, but also the power of the 
test is less than would be indicated by Tang's tables [1]. The lack of statisti¬ 
cal control represented by variation of expected values within a class has the 
effect of making it less likely than the standard F-test indicates that the null 



DOWNWARD BIAS 


353 


hypothesis will be rejected whether it be true or false. Furthermore, even for 
relatively low values of <f >\, the reductions in the probabilities of rejection may 
be over 40 per cent as indicated by some examples given below. 

If the average null hypothesis is true but (1.1) is false it follows that 


so that the full effect of the downward bias occurs in that case. Thus in cases 
where statistical control is lacking, to test the average null hypothesis by the F- 
test may well result in accepting the hypothesis when it is false. If the null 
hypothesis is rejected, however, then we can expect that the differences among 
the true means are even larger than indicated by Tang’s tables. 

To illustrate, it is shown in Section 4 that if k = 5 and N = 7, then the prob¬ 
ability of rejecting the average null hypothesis when it is true, but (1.1) is false 
will not be the preassigned .05 but something less than .03 if > .05. Fur¬ 
thermore, if <t>i > .07, then the power of the F tests for this example will be re¬ 
duced by at least 40 per cent whatever the value of <t >\. 

The conclusions reached above remain valid for the analysis of variance and 
covariance in general. In the general case however, the value of the average 
null hypothesis in simplifying the analysis may be considerably reduced since 
the parameter <f>l no longer vanishes when the average null hypothesis is true. 
For example, if Ey, — (3,x, , and if the average null hypothesis is $ = 0, where 
Nj3 = ft + • • • + ft #, then upon calculating 

(E 0'Xl) 2 

,2 _ _ 

01 — 2 V' 

<T 2Lj X p 
v 

we see that 0? will not vanish in general if $ vanishes. 

Although as shown above the average null hypothesis may not have too great 
importance in the case of regression, yet if the “variance between treatments” 
is a function of arithmetic means of the random variables as in the “pure” 
analysis of variance the average null hypothesis may well be very useful. Simple 
examples of this are provided by the randomized block, Latin square, and similar 
designs. 

The distributions that we shall need are given in Section 3. The inequalities 
on the basis of which the bias is demonstrated are obtained in Section 4. 

It would be highly desirable to have Tang’s tables extended so that they might 
provide the answers to the questions raised by this source of bias. In the ab¬ 
sence of such extensions the inequalities of Section 4 may give some rough 
idea, but these inequalities are not sharp enough. 


3. The calculation of the distributions. The following theorem was proved, 
although not explicitly stated, as part of an earlier note [2]. (Note the change 
from Xi to Vi as the notation for the random variable.) 



354 


WILLIAM G. MADOW 


Theorem 1. Let y\ , • • • , ya be normally and independently distributed with 
variance a and means Oi, • • • , a# and let qi , • • • , q m he quadratic forms 

Qy — Hit Up? Vn y* 

M.* 


in 2 / 1 , • • • , Vn of ranks fti, • • • , n m . Then , if an orthogonal transformation 

y P = 


exists such that 


( 2 . 1 ) 


Qy = Z , 

a 


it follows that the random variables q y /a 2 are independently distributed in x ' 2 dis¬ 
tributions with degrees of freedom n \, • • • , n m and parameters Xi, • • • , \ m , where 


\ y 



aL 7) a u a v 


Eqy 

2<r 2 


ny 
2 * 


Various conditions for the existence of an orthogonal transformation satisfy¬ 
ing (2.1) of Theorem 1 have been given. Among these are: 

1. Cochran's [3] condition . If X Q.y — y\ then a necessary and sufficient 

y v 

condition for the existence of an orthogonal transformation satisfying ( 2 . 1 ) 
is = V. 

y 

2. Craig's [4] condition . If A y denotes the matrix (a M £ 7) ) then a necessary 
and sufficient condition for the existence of an orthogonal transformation satis¬ 
fying (2.1) is A y A v = 8 yn A y where 8 yv is the null matrix if y y\ and the identity 
matrix if 7 = 17 . 

3. Linear Hypothesis condition . (Kolodziejczyk [5]) If X be the likelihood 
ratio test of a linear hypothesis and if E 2 = 1 — \ 2lN , then E 2 = qi/(qi + qi) 
and an orthogonal transformation exists satisfying ( 2 . 1 ) with m = 2 . 

To summarize some results obtained by Tang [ 1 ], let us state 

Theorem 2. If \i and are independently distributed in distributions with 
ni and n% degrees of freedom and parameters Xi and X*, and if 


= 


n 

Xi 


Xi 


/ 2 > 
X2 


then the probability density of E* is 

p - p(^ | X x , X*, n x , m) - e~ Xl “ x *(^ 2 ) lWl/2) " 1 (l - 


z 

« 1*«2 


x? 1 x? a r ^ 1 + . , w ? + ai + a ^ 

! 02! r + “>) r (j + <*) 


( 2 . 2 ) 


(£*)“(! - . 



DOWNWARD BIAS 


355 


By assigning certain values to Xi and Xj we obtain the following special cases 
of (2.2) 

Pl = p(E 2 | Xi, 0, mi , m) = e -'“(£ 2 ) <ni,2,_, ( 1 - £ 2 ) (ns,2) - 1 


(2.3) 


z xrr(!43 + a ) 


(IS 2 )" 1 


(2.4) 


Ps = p(2S* 10, X,, ni, «*) = e -X2 (j& 2 ) <B,/2>-1 (l - £’ 2 ) (n,,2)_1 


•2 


xrr(”-i±^ + «) 

“ 2!r (?) r (l + as ) 


(1 - E 2 )** 


(2.5) po - v(E 2 10, 0, tii , n 2 ) 



(£; 2 )(”»/ 2) “ 1 (l - #2)<" 2/2 )-i. 


It is noted that (2.3) is Tang’s distribution (112) upon which the calculations 
of his tables were based. To see this we need only make the correspondence 


This paper Tang 
Xi X 

Wi, n 2 fi , / 2 

a\ i 


We define € to be the probability of an error of the first kind. Tang obtained 
the critical values E] of El 2 by requiring that 

Pl = £, Po dE i 

= € say .01 or .05. 

Then he calculated 

Pn = f ** Pi(£ 2 1 x,, 0, ni, ns) d£ 2 
Jo 

using the values of P* obtained above. Hence 1 — Pu is the power of the test. 
If, however, Xi = 0 but X? t 6 0, then to find 



356 


WILLIAM G. MAD0W 


we could make the transformation G 2 = 1 — E 2 and find 

c i-b* 

Pin = / p(G 2 1 0, X 2 , rii , n 2 ) dG 2 . 

Jo 

It is easy to verify that 

p(G 2 1 0, X 2 , ni, na) = Pi(E 2 j X 2 ,0, na, rii) 

if we put G in place of E 2 in the latter density. It follows that to calculate Pm 
it would be sufficient to have full tables of Tang’s distribution since 

P///= Pi(E 2 1 X 2 ,0, n 2 , 7ii) dE 2 . 

•'O 

Tang’s tables are not however sufficiently extensive. Furthermore, tables of 
(2.2) are also necessary. As yet these tables do not exist. However, some useful 
conclusions can be drawn from the inequalities obtained in the following section. 

First, however, let us evaluate n \, n 2 , Xi and X 2 for the generalized Student’s 
hypothesis discussed in the introduction. It is easy to see that ni = k — 1 
and na = k(N — 1). To evaluate Xi and X 2 we note from Theorem 1 that we 
only need substitute Ey {j for ya in qi and q 2 where 

91 = N Z (y< — yf 

t 

( 2 /*> Vi) • 

i.v 


Upon making these substitutions we obtain 

Xi = Z (<*< - «) 2 

2(7- * 

Xj = ^ Z («.» - a.) 2 - 

2 or 2 i.v 


Thus the various hypotheses concerning the a»y influence the distribution of F 
or E 2 = 1/(1 + Ftii/rh) by affecting the values of Xi and X 2 . 


4. Limits of the values of p . It follows readily from (2.2) that, 

2^(n 2 /2)-l 


(3.1) 


r / ni + n* \ 

= ) 2 4 (£ 2 ) <ni/2, - i d - 

KtM?) 


-X 1 -X 2 


xrxr /ei2 , 


I 


(£ 2 )«1(1 _ E*) a ' Ca 


where 

r (l + “') r (? + «) r 



DOWNWARD BIAS 


357 


Now if a > 0, b > 0, and j is an integer > 1 , we have 


0 + bX 1 + b + ~2)"'( 1+ b + 2(j- 1 )) < i 1 + b) ' 


Hence, it follows that 




Ml -f- 722 “f* 2«i 


Substituting we see that 


po <f ■ e 1 ' 1 < p < p 0 e -Xl-X2 


” expf lE -(SL±-) ra p[^«] + X ! (l - «(”^)} 

and 

(3.3) p, < p < p, exp [-X 2 + X 2 (l - £ 2 )(-iL±_2»^ + 2 ij]. 

Let 2rii<t>\ = X,, i = 1, 2. 

Theorem 3. Le< « = / po d/? 2 so that e is the probability of an error of the first 
•**2 

kind. Then , /or all values of <f>\ 

(3.4) e > f\ p 2 dE 2 

J *2 

and if E 2 > ri\/{n\ + nf), it follows that 

r l 

(3.5) e > t expj —2rh<t>2 + 2*1(1 - £ 2 )(n, + n 2 )} > I jh dE 2 > . 

J *1 

Furthermore , /or all values of <f>l 

(3.6) £, p. rfjs 2 > Jr p rfi? 2 , 
and t/ 2£ 2 > (n x + 2)/(n x + r^), it follows that 

f VidEi > exp{— 2n 2 02 + 2^>1(1 — E])(ni + n*) f V\dE 2 

m *‘ r 1 r 1 

> [ vdE 2 > <r 2ni *i [vidE*. 

Jb* Jb\ 


Finally , t /7 can assume the two values 0 and 2 , it follows that if 

(3 ’ 8) 02 > 2(tf(», + n!)° g - (« 2 + ^j) > °’ 

then if y = 0 , 


J t pt dE 2 < tS 



358 


WILLIAM G. MADOW 


and if 7 = 2 
(3.10) 


r 

JbI 


pdE 2 < 8 



Proof. To prove (3.4) and (3.6) it is only necessary to follow Daly’s [6] 
procedure . 2 Since 

exp{— 2 w 2 02 + 2</) 2 (1 ~ E 2 )(ni + n 2 ) + 7 ^ 2 } 


and 


exp {— ihtlE 2 } 

are decreasing functions of E 2 , and 

exp {— 2n 2 0 2 + 202 (1 — E 2 )(jh + 7i 2 ) + 7 ^ 2 ! 


if 


< 1 


E 2 

n\ + n 2 

the inequalities (3.5) and (3.7) follow immediately from (3.2) and (3.3). Finally 

exp {— 27?202 + 202(1 — E 2 )(jii + rh) + 70 2 j < 5 < 1 

if (3.8) is true, so that (3.9) and (3.10) follow. 

From (3.8), (3.9) and (3.10) we can calculate either a lower limit for the bias, 
if we know 0 2 , or the upper limit that 02 can have if we wish the bias to be not 
greater than some given amount. Thus these limits do not answer the important 
question of what is a value 0 2 such that if 0 2 <0 then the bias is less than (1 — 
$)€. They only provide a value <f>' of 0 2 such that if 0 2 > 0 ' then the bias is at 
least (1 — 5)e. 

If, for example, 8 = .5 and ni = 1 as in the case of Students’ ratio; we have 
if 7 = 0 


02 > 


.693 

2(712 E* - 1) 


and if € = .05, then E 2 decreases steadily from .903 if n 2 = 2, to .063 if 712 = 60* 
and the corresponding lower limits of 0 2 decrease from .43 to .12. Thus, if 
0 2 > .43 or .12 in these two cases, it follows that the probability of rejecting the 
average null hypothesis will be not .05 but something less than .025. 

If 8 = .6 and fti = 4, — 30 then we can evaluate the lower limit of 0 2 for 

the example given in the introduction finding. 


4,2 > 2(.279)(34) - 8 ,05 

implies a downward bias of at least 40 per cent of .05. Also, if 0 2 > .07 then for 


* The procedure followed is given in [6] on pp. 4, 5, equations (2.2) through Lemma 1. 



DOWNWARD BIAS 


359 


any value of <f> 1 the power of the analysis of variance test is reduced at least 40 
per cent. 

5. Conclusions. The rather sharp effects of a moderate lack of statistical 
control on the probabilities associated with the F-test indicates the importance 
of testing for statistical control outside of the industrial applications now made. 
Furthermore, it would seem advisable to investigate tests and designs that are 
less sensitive to the lack of control than is the F-test. 

REFERENCES 

[1] P. C. Tang, “The power function of the analysis of variance tests with tables and illus¬ 

trations of their use,” Statistical Research Memoirs , Vol. 2 (1938), pp. 126-157. 

[2] William G. Madow, “The distribution of quadratic forms in non-central normal ran¬ 

dom variables,” Annals of Mathem. Stat., Vol. 9 (1940), pp. 100-104. 

[3] W. G. Cochran. “The distribution of quadratic forms in a normal system, with applica¬ 

tions to the analysis of covariance,” Cambridge Phil. Soc. Proc. Vol. 30 (1934), 
pp. 178-191. 

[4] A. T. Craig, “Note on the independence of certain quadratic forms,” Annals of Math. 

Stat., Vol. 14 (1943), pp. 195-197. 

[5] S. Kolodziejczyk, “On the important class of statistical hypotheses,” Biometrika, 

Vol. 27 (1935), pp. 161-190. 

[6] J. F. Daly, “On the unbiased character of likelihood-ratio tests for independence in 

normal systems,” Annals of Math. Stat., Vol. 11 (1940), pp. 1-33. The proce¬ 
dure followed is given on pp. 4, 5, equations (2.2) through Lemma 1. 



MIXTURE OF DISTRIBUTIONS 

By Herbert Robbins 

Department of Mathematical Statistics, University of North Carolina 

1. Summary. Mixtures of measures or distributions occur frequently in the 
theory and applications of probability and statistics. In the simplest case it 
may, for example, be reasonable to assume that one is dealing with the mixture 
in given proportions of a finite number of normal populations with different 
means or variances. The mixture parameter may also be denumerably infinite, 
as in the theory of sums of a random number of random variables, or continuous, 
as in the compound Poisson distribution. 

The operation of Lebesgue-Stieltjes integration, j f(x) dn, is linear with 

respect to both integrand f(x) and measure n. The first type of linearity has as 
its continuous analog the theorem of Fubini on interchange of order of integra¬ 
tion; the second type of linearity has a corresponding continuous analog which 
is of importance whenever one deals with mixtures of measures or distributions, 
and which forms the subject of the present paper. Other treatments of the 
same subject have been given ([1], [2]; see also [3], [4]) but it is hoped that the 
discussion given here will be useful to the mathematical statistician. 

A general measure theoretic form of the fundamental theorem is given in 
Section 2, and in Section 3 the theorem is formulated in terms of finite dimen¬ 
sional spaces and distribution functions. The operation of convolution as an 
example of mixture is treated briefly in Section 4, while Section 5 is devoted to 
random sampling from a mixed population. 

We shall refer to Theory of the Integral by S. Saks (second edition, Warszawa, 
1937) as [S], and the Mathematical Methods of Statistics by H. Cram6r (Prince¬ 
ton, 1946) as [C]. 

2. Mixture of measures in general. Let X(Y) be a space with points x(y) 
and let 36(2)) be a <r-field of subsets of X(Y). Let v be a measure on g). Let 
Hy be for a. e. (v) y a measure on 36, such that n y (S) is for every S in 36 a measurable 
(g)) function of y. Define for every S in 36, 

(1) m(S) = l *(<S) dv. 

Theorem 1 . n is a measure on 36. If v(Y) = n y (X) = 1, then n(X) = 1. 

Proof. Clear. 

Theorem 2. If f(x) is any non-negative or non-positive function measurable 
(£) then the function 

g(y) = j x f(x) dut 

360 


( 2 ) 



is measurable ( 2 )), and 
(3) 


MIXTURE OF DISTRIBUTIONS 


361 


J* fix) dp = giy) dv . 

Proof. First let fo(x) be any non-negative simple function [S, p. 7] of the 
form 


(4) f 0 (x) = [a*, Si ; ••• ;a kj S k ) 

where the S t are disjoint sets in H such that X = S , and the a t are non-negative 

constants. Then 

f k 

(5) g Q (y) = / f 0 (x) dyy = J2 a>n v iS>) 

Jx 1 

is a non-negative function measurable (g)), and from (1) it follows that each side 
of (3) is equal to diPiS x ). Hence the theorem holds in this case. 

Next let f(x) be any non-negative function measurable (3E); then [S, p. 14] 
there exists a sequence f n (x) of simple functions such that for every x, 

(6) 0 < /i(.r) < f*(x) < • • • ; lim f n (x) = fix). 

n —» oo 

Setting 

(7) g,Xy) = J x fn(x) dn y , giy) = ^/Cc) d ^ > 


it follows from the theorem of monotone convergence [ S , p. 28] and from the 
preceding paragraph that 

(8) f f(x) dy = lim f f n ix) dy = lim f g n (y) dv, 

J X « _* oa J X n —» oo J V 

(9) giy) = lim / f„(x) = lim g n (y). 

From ( 6 ) and (9) it follows that for a.e. (v)y, 

( 10 ) 0 < gi(y) < g 2 iy) < • • • ; lim g n (y) = 9 ( 3 /). 

n —.oo 

Hence g{y) is measurable (2)), and from the theorem of monotone convergence, 

(11) [ g(y) dv = lim f g n (y) dv. 

JY n —.oo J Y 


Equation (3) now follows from ( 8 ) and (11). 

By passing from f(x) to —fix) we establish (3) when fix) is any non-positive 
function measurable (£). This completes the proof of Theorem 2. 

If fix) is an arbitrary function measurable (X) we define 


(12) fix) 


(/(*) »/(*) > 0 _ (fix) if fix) < 0 

( 0 otherwise ’ 10 otherwise 



362 


HERBERT ROBBINS 


so that 

d3) m=f + (x) + rw 

is the sum of two functions measurable (£) of constant sign. By Theorem 2 the 
functions 

(14) gdy) = f fix) dy y , gdy) = f f~(x) dyy 
are measurable (g)) and 

(15) 0 < J f(x) dy = f gdy) dv < », 

(16) 0 > j fix) dy = J gdy) dv > —<*>. 

The integral / f(x) dp i exists if and only if at least one of the two quantities (15) 
Jx 

and (16) is finite [S, p. 20]. 

Theorem 3. A necessary and sufficient condition that 

(17) JT fix) dy = f jjf fix) dy v \ dv 

is that at least one of the two quantities (15) and (16) be finite. 

Proof. By the remark preceding Theorem 3 the condition is clearly necessary. 
Now suppose, e.g., that (15) is finite; we must show that (17) holds. By hypoth¬ 
esis, 

(18) f f + ix) dy < ae, f fix) dy = J fix) dy + f fix) dy. 

From (18) and (15) it follows that 0 < gdy) < <» for a.e. (v)y, hence 

(19) f fix) dy v = jf f + ix) dyy + f fix) dyy = gdy) + gdv) 
exists for a.e. (v)y. From the finiteness of (15) it follows that 

(20) f (gdv) + gdv)) dv = f gdy) dv + [ gdy) dv 

Jy Jy Jr 

exists. Hence from (19), the integral 

(2i> /,{/, f(x) dyy | dv = (gfy) + g*(y)) dv 

exists. Equation (17) now follows from ( 21 ), ( 20 ), (15), and (18). This com¬ 
pletes the proof of Theorem 3. 

Corollary 1. If y ( X ) < oo , and if f(x) is bounded from above or from below , 
then both sides of (17) exist and the equality holds. 



MIXTURE OF DISTRIBUTIONS 


363 


Proof. If, say,/(a) < C < oo, then 

0 < £/ + (a-) dp<C ■ I u(X) < X, 

and the result follows from Theorem 3. 

We shall now show by an example that the existence and even the finiteness of 
the right side of (17) does not imply the existence of the left side. 

Let X = Y = {1, 2, • • • , n, • • •) and let 3E(2)) consist of all subsets of X(Y). 
Let v be the measure which assigns mass c n to n, where the c n are positive con¬ 
stants such that c n = 1. Let n n assign the mass l/2/i to each of the points 
1 , 2, ••• , 2 n. Let/W be such that/(l) = b u f(2) = — bi,/(3) = b 2 ,f(4) 
= — b 2 , • • • where the b n are positive constants. Then 

f /W^n = 0 (W = 1,2, - 

Jx 

so that 

The measure n defined by (1) assigns to each n a positive value n(n) given by 
m(1) = m(2) = C1-C2)- 1 + Cf(2*2) —1 + *•( 2-3)" 1 + ••• 
m( 3) = *i(4) = <v(2-2)- 1 + c s -(2-3)“ l + ••• 


where n(X) = ^2ii(n) = = 1. 

1 1 

Now fix the b n and c n in such a way that 


brn(l) + &2* m (3) + brn(5) + 


Then 


OO . 


f x f + U) dn = — j x f (r) fill =«, 

so that the left side of (17) does not exist, even though r(F) = n v (X) = n(X) = 
1 and the right side of (17) exists and is equal to zero. 


3. A restatement of the preceding results in the form most useful in prob¬ 
ability theory. Let x = (xi , • • • , x n ) be a point in the n-dimensionai Euclidean 
space R n , and let B n denote the tr-field of Borel sets in R n . Let S x denote the 
half-open interval in R n consisting of all points (wi , • • • , w n ) in R n satisfying the 
inequalities 

(22) Wi < XI , • • • , Wn < Xn \ 
then if m is any probability measure on B n the function 

(23) F(x) = m(S x ) 



364 


HERBERT ROBBINS 


is the distribution function corresponding to p. Conversely, if F(x) is any dis¬ 
tribution function in R n [C, p. 80] there is a unique probability measure p on B n 
such that (23) holds. As a matter of notation we write for any Borel measurable 
/(*)> 

(24) [ f{x) dp = [ fix) dFix) 

jR n J- oo 

provided the integral on the left exists. 

Now let y = (t/i, ■ • • , y m ) be a point in R m , let Giy) be a distribution function, 
and let v denote the corresponding probability measure on B m . Let Fix,y) 
be for a.e. iv)y a distribution function in x, and for every x a Borel measurable 
function of y, and let py be the corresponding probability measure on B n . 
Theorem 4 . The function 

(25) Hix) = [" Fix , y) dGiy) 

J— 00 

is a distribution function in R n . Let p denote the corresponding probability measure 
on B n . Then for any S in B n , PyiS) is a Borel measurable function of y and 

(26) uiS) - r y v iS) dGiy). 

J- oo 

Proof. Let C denote the class of all Borel sets S in R n such that PyiS) is a 

Borel measurable function of y . We shall show that C is a normal class [S, p. 83]. 

00 

(i) If Si , S *, • • • is a sequence of disjoint sets in C and if S = ^S n , then 

i 

PyiS) = Py MyOSn) 

is a convergent series of Borel measurable functions and is therefore itself a Borel 
measurable function. 

00 

(ii) If Si 3 Si 3 • • • is a decreasing sequence of sets in C and if S = II£ n > 

i 

then 

M»(S) = Mv (ft S^) = lim n„(S„) 

is the limit of a sequence of Borel measurable functions and is therefore a Borel 
measurable function. 

Hence C is a normal class. But C contains every interval S x , for PyiS x ) = 
Fix f y) was assumed to be a Borel measurable function of y for every x. It 
follows [S, p. 85] that C = B n . 

It now follows from Theorem 1 that the set function piS) defined by (26) 
is a probability measure on B n . The corresponding distribution function is the 
function H{x) defined by (25). Thus Theorem 4 is proved. 



MIXTURE OF DISTRIBUTIONS 


365 


Let fix) = / + (x) + fix) be any Borel measurable function. Then from Theo¬ 
rem 2, the integrals 


(27) 


(28) 


r / + (x) dH( X ) = r f + (x) d, {r f( X) V ) dG <y> \ 

= / + (x) d, F(z, y) | d<?(y), 

["/-(*) dH(x) = f° fix) dj r F(x, y) dG(y) \ 

= j[] {£ /-(*) dxF(x, y) | d(7(y) 


exist. The following theorem is an immediate consequence of Theorem 3 and 
Corollary 1. 

Theorem 5. A necessary and sufficient condition that 

(29) J fix) d x | J Fix, y) dGiy)^ = J j J fix) d x Fix, y) j dGiy) 

is that the left side of (29) exist ; i.e. that at least one of the quantities (27) and (28) 
be finite. This will be true in particular if f(x) is bounded from above or from below. 


4. The operation of convolution. An example of the general mixture (25) 
of distribution functions is the operation of convolution: if F(x ), G(x) are two 
distribution functions in R\ then F(x, y) = F(x — y) satisfies the conditions of 
Theorem 4, so that 

(30) H(x) = [* F(x-y)dG(y) 

J — oo 

is also a distribution function in R\ , denoted by 

(31) H(x) = F(x) * G(x). 

Corresponding to any distribution function F(x) in R\ is the characteristic 
function 

(32) vit) = £ e" z dFix) 

which in turn uniquely determines F(x) [C, p. 93]. 

Theorem 6. Let F(x ), G(x), H(x) be distribution functions in R\ and let <pi(t), 
<P 2 (t), <p(t) be the corresponding characteristic functions. Then 

(33) H(x) - F(x) * G(x) 
if and only if 

(34) 


<?(t) = <Pl(t)-<P2(t). 



366 


HERBERT ROBBINS 


Proof. 


(35) 


Assume (33) holds. Since |c* te | < 1 we have from Theorem 5, 
v(t) = /” e i,x d,{£ F(x - y) dG{y) j 

= e' tx d,F(x - y) j dG(y) 

= jf” e** |j[" </,F(z - 2 /) j d<7(y) 


= £^/ 


d-Giy) = a(0 


The converse implication now follows from the fact that the characteristic func¬ 
tion of a distribution determines the latter uniquely. 

The importance of the operation * in probability theory arises from the fact 
that if X, Y are independent random variables with respective distribution func¬ 
tions F(x), G(x ), and if Z = X + F, then the distribution function H(x) of Z 
satisfies (33), since for any value of a, 


H(a) = P[X + Y < a] = Jf dF(x) dG(y ) 

(36) x+yia 

= ("I f dF(x) ) dG(y ) = f“ Fia - y) dG(y ) = F(a) * G(a), 
<J~co V J •'—00 

the evaluation of the double integral by an iterated integral following from 
FubinPs theorem [S, pp. 76-88]. However, (33) may hold without X , Y being 
independent, and Theorem 6 shows that (34) will then hold also, and con¬ 
versely. 

An example where H(x) — F(x) * G(x) without X , Y being independent 
has been given by Cramer [C, p. 317, exercise 2]. We shall give another. Let 
points 0, A, • • • , F in the (a:, 7/)-plane be defined as follows: 

0 = (0, 0), A = (1, 1), B = (1/2, 1), C = (0,1/2), D = (1, 0), 

E = (1, 1/2), F = (1/2, 0). 


Let f(x , y) have the value 2 inside the quadrilateral OABC and the triangle DEF , 
and 0 elsewhere. Then if /(x, y) is the joint frequency function of X, Y it is 
easily seen that X and Y have uniform distributions on the intervals 0 < x < 1, 
0 < y < 1 respectively and that Z = X + Y has the triangular distribution 
given by (33), although X and Y are not independent. 

It would be interesting to know what distribution functions F(x) are such that 
if X, Y, Z = X + Y are random variables with the distribution functions F(x), 
Fix), F(x) * F(x) respectively, then X and Y are necessarily independent. A 
rather trivial example of such a distribution function is the step function F(x) 
with jumps of £ at the points x = 0 and x = 1. It can be shown (oral commu¬ 
nication by W. Hoeffding), in generalization of Cramer’s example, that no abso- 



MIXTURE OF DISTRIBUTIONS 


367 


lutely continuous distribution function (e.g. the normal distribution function) 
has this property. 


5. The problem of random sampling from a mixed population. Let G(v) be 
a distribution function in the real variable v , and let F(u, v) be for a.e. (relative 
to the measure corresponding to G) v a distribution function in the real variable 
u , and for every u a Borel measurable function of v. Let 


(37) 


H{u) = J F(u, v) dG(v ); 


(38) 


then by Theorem 4 H(u) is a distribution function in Ri . Now define for 
X = (*1 , • • • , Xn), y = (yi, • • ■ ,y n ) 

R(x) = H(x,) ■ ■ ■ H(x n ), 

G(y) = G(yi) ■ ■. G(y n ). 

Both R(x) and G(y) are then distribution functions in R n ■ In particular, ff(x) 
is the distribution function of a random sample of n independent variates each 
with the distribution function (37). Set 

(39) F(x, y) = F(x i, yi) • • • F(x n , y n ); 

then for a. e. (relative to the measure corresponding to G) y, F(x, y) is a distribu¬ 
tion function in x, and for every x, P(x, y) is a Borel measurable function of y . 
By Fubini’s theorem we have 

[ F(x n , y n ) dG(y n ) 

J— oo 

F(x n , y„) <lG(yi) ■ ■ ■ dG(y n ) 


S{x) = [ F(x j, t/i) (IG(y t ) ■ • • 

J—oo 

(40) = f ... f F(x,, yd 
= r Fix, y) dG(y). 

J—oo 

Thus R ( x ) is itself a mixture in the sense of Theorem 4. It follows from Theorem 

5 that for any Borel measurable function f(x), 

(41) j f(x) dffix) = ^ fix) d x Pix, y) | dGiy), 

if and only if the left side of (41) exists. When written out in full (41) becomes 
f ••• f f(xi, ••• ,x n )d xi I f Fixi , yi) dGiyi) 

J—oo J—oo J— oo 


( 42 ) 


{/: 


L ft xi > 

J—ao 


Fix n , yn) dGiy n ) 

» 

, Xn) d H Fix i, yi) • 


£•••£{£ 


dx n F(x n , yn) fdGiyx) • • • dG(y n ). 




368 


HERBERT ROBBINS 


Equation (41) is of particular interest in connection with the distribution 
of a statistic t = i(xi ,••*,£») = t(x). For any distribution function J(x) let 
K(t | J) denote the distribution function of t when x has the distribution function 
J(x). If we set 


(43) 

fix) = 

f 1 if t(x) < t, 

[ 0 otherwise, 

then 



(44) 

kh i j) => r fix) cuix). 

J— 00 

Hence from (41), 



K{t | tf (*,) • • 

• H(xJ) — 

Kit 1 R) = r Kit 1 Fix, y)) dGiy) 

<«> 

-oo 

J— 00 

-L 

■ I Kit 

J— 00 

1 Fix ,, y,) Fix n , y„)) dGiy,) • • • dGiy*). 

As an example, let t(x) be Student’s ratio 

(46) 


t — ti • xf s, 

let 



(47) 

F(u, v) = 

L e dy> 

and let 

, 

0 for v < — a, 

(48)- 

G(v) = 

i for — a < v < a, 

1 for a < v. 

Then H(u) will be the distribution function of a mixture in equal proportions of 
two normal populations with unit variances and with means — a, a respectively, 


and K(t | H(x i) • • • H(x n )) will be the distribution function of t in random 
samples of n from this non-normal population. On the other hand, K(t | F(xi , 
2/i) • • • F(x n , y n )) will be the distribution function of t in sampling from successive 
normal populations with unit variances and means y \, • • • , y n respectively. 
Relation (45) now becomes 

(49) K(t | H( Xl ) ■ ■ • H(x n )) = £ K(t | F(x,, y,) ■ • • F(x n , y K ))/ 2“, . 

vi 

where the summation is over all 2 n sets ( 2 / 1 , • • • , y n ), each yi being either —a 
or a. Due to the complexity of K(t\ F(x 1 , yi) • • ■ F(x n , y n )) (the frequency 
function of which is discussed in a forthcoming paper by the author), relation 



MIXTURE OP DISTRIBUTIONS 


369 


(49) is not very useful. In other cases (45) may afford a considerable simplifica¬ 
tion in the evaluation of the distribution function of a statistic obtained in 
random sampling from a mixed population. 

REFERENCES 

[1] W. Feller, “On the integro-differential equations of purely discontinuous Markoff 

processes,” Am. Math. Soc. Trans., Vol. 48 (1940), p. 488. 

[2] R. H. Cameron and W. T. Martin, “An unsymmetric Fubini theorem,” Am. Math. 

Soc. Bull ., Vol. 47 (1941), p. 121. 

[3] P. R. Halmos, “The decomposition of measures,” Duke Math. Jour., Vol. 8 (1941), 

p. 386. 

[4] W. Feller, “On a general class of ‘contagious’ distributions,” Annals of Math. Stat ., 

Vol. 14 (1943), p. 389. 



SOME APPLICATIONS OF THE MELLIN TRANSFORM IN STATISTICS 

By Benjamin Epstein 

Coal Research Laboratory , Carnegie Institute of Technology 

1. Summary. It is well known that the Fourier transform is a powerful ana¬ 
lytical tool in studying the distribution of sums of independent random variables. 
In this paper it is pointed out that the Mellin transform is a natural analytical 
tool to use in studying the distribution of products and quotients of independent 
random variables. Formulae are given for determining the probability density 

functions of the product and the quotient where £ and rj are independent posi- 

V 

tive random variables with p.d.f.’s f(x) and g(y), in terms of the Mellin trans¬ 
forms F(s) = / f(x) x*~ l dx and G{s) = / g(y)y , ~ 1 dy . An extension of the 
Jo Jo 

transform technique to random variables which are not everywhere positive is 
given. A number of examples including Student’s ^-distribution and Snedecor’s 
F-distribution are worked out by the technique of this paper. 

2. Introduction. It is well known [2], [3] that the Fourier transform is a 
useful analytical tool for studying the distribution of the sums of independent 
random variables. It is our purpose in this paper to study another transform 
which is useful in studying the distribution of the product of independent random 
variables. While it is perfectly true that one can reduce the study of the distribu¬ 
tion of the random variable £ = £r£a • • • £»* the product of n independent 
random variables £i, £2 , • • • , £» , to the study of the distribution of the random 
variable y = log £ = log £1 + log £2 + • * * + log £„ , the sum of n independent 
random variables, it seems worth while to study the distribution problem directly. 
There are advantages inherent in the direct attack on the distribution problem 
which are lost to a considerable degree, if the problem is so transformed that the 
Fourier transform becomes applicable. In this paper we shall show that the 
direct application of the Mellin transform to the study of the distribution of 
products of independent random variables yields results of interest. 

3. Connection between Mellin transforms and products of independent 
random variables. The key reason for the importance of Fourier transforms in 
studying the distribution of sums of independent random variables depends on the 
following result: if £1 and £2 are independent random variables with continuous 1 
probability density functions, (henceforth abbreviated as p.d.f.),/i(x) and / 2 (a), 
respectively, then the p.d.f. f(x) of the random variable £ = £1 + £2 is expressible 1 
as 

(1) fix) = f fiix - y)hiy) dy = f f 2 (x - y)fi(y) dy. 

J—OO J— 00 

1 Id this paper we shall assume throughout that we are dealing with random variables 
with continuous p.d.f. , s. The argument can be extended with some changes to distribu¬ 
tion functions which are perfectly general, but for simplicity this will not be done here. 

370 



APPLICATION OP MELLIN TRANSFORM 


371 


But since these expressions are just the Fourier convolutions of fi(x) and 
it is small wonder that the Fourier transform plays such a basic role in studying 
the distribution properties of sums of independent random variables. 

Consider now the following result for products of independent random variables 
(4), (5): if £i is a random variable with continuous p.d.f. fi(x) and £ 2 , independent 
of £i, is a positive random variable with continuous p.d.f. / 2 (a;), then the p.d.f. 
f(x) of the random variable £ = £i£ 2 is expressible 2 as 

(2) f(x) = f \ fi (?) My) dy. 

J o V \y/ 

But equation (2) is precisely in the form of a Mellin convolution of fi(x) and / 2 (x) 
and therefore it may be expected that the Mellin transform should be useful in 
studying the distribution of products of independent random variables. 

It is useful to indicate briefly the properties of the Mellin transform. A de¬ 
tailed treatment of this transform will be found in [6] and we shall, therefore, 
stress only those portions of the theory of Mellin transforms which are of im¬ 
portance in the field of statistics. By definition, the Mellin transform F(s), 
corresponding to a function f{x) defined only 8 for x > 0, is 

(3) F(s) = f f{x)x- 1 dx. 

Jo 

Under certain restrictions on f(x) [G, p. 47], F(s ) considered as a function of the 
complex variable s is a function of exponential type, analytic in a strip parallel 
to the imaginary axis. The width of the strip is governed by the order of 
magnitude of f(x) in the neighborhood of the origin and for large values of x and, 
in particular, the strip of analyticity becomes a half-plane if f(x ) decays expo¬ 
nentially as x —» qo . There is a reciprocal formula enabling one to go from the 
transform F($) to the function f(x). This transformation is: 

(4) f(x) = f x'F(s) ds 

ZttI Je —»,» 

for all x where f(x) is continuous and where the path of integration is any line 
parallel to the imaginary axis and lying within the strip of analyticity of F(s). 

a More generally [4, p. 411], if and $ 2 are independent random variables with continuous 
p.d.f.’s/iCx) and/ 2 (x), then the p.d.f. of the random variable £ = £i$ 2 is expressible as: 

(2 °- /(I) - £ Yy\ h G ) Mv) du ~ C\7j 1 h G) m dV ' 

In [4] analogous results are given for random variables with perfectly general distribution 
functions. 

a The reason for this restriction is that there are technical difficulties in defining a Mellin 
transform directly for a function defined over (—»,<»). In [6], for instance, the Mellin 
transform theory is given for functions defined only for positive values of the argument. 
In statistical terminology this means that we are restricting ourselves for the moment to 
positive random variables. This is, of course, an unnatural restriction and we shall indi¬ 
cate later in the paper a simple device for treating such questions. 



372 


BENJAMIN EPSTEIN 


If, in particular, we are interested in applying Mellin transforms to p.d.f/s 
of positive 4 random variables, the analysis can be carried out rigorously. Also, 
as in the case of the Fourier transform, one has the desirable property that there 
is a one-one correspondence between p.d.f.’s and their transforms. 

A number of common p.d.f.’s of positive random variables have simple Mellin 
transforms. For example see Table 1. 

In terms familiar to the mathematical statistician, the Mellin transform of a 
positive random variable £ with continuous p.d.f. f(x ) is where 

(5) F(s) = EiC 1 ) = (~ x-'fix) dx. 

Jo 

The following three basic properties hold: (i) The positive random variable 
rj = a £, a > 0 has the Mellin transform G(s) = a ' -1 F(s). This is immediate 
since 

( 6 ) G(s) = Eiv'- 1 ) = E(a~ l C') = a * -1 F(s). 

(ii) The positive random variable 77 = £“ has the Mellin transform G(s) = 
F(as — a + 1 ). To prove this we note that 

(7) G(s) = E{rT l ) = E(Z a ~ a ) - F(as - a + 1). 


In particular if a = — 1, i.e., 77 



G(s) - F(-s + 2 ). 


This is a result which we shall have occasion to use later in the paper. 

(iii) If £1 and £2 are independent positive random variables with Mellin transforms 
Fi(s) and F 2 (s), respectively, then the Mellin transform of the product 17 = 
is G(s) = Fi(s) F 2 (s). This is immediate since 

( 8 ) g(s) - E{yT i ) = Eiiz^r 1 ] = Ear 1 ) Ear 1 ) 

- F 1 (8)F 2 (8 ). 

More generally if f 1 , £ 2 , * • • , £n are independent positive random variables with 
Mellin transforms Fi(s), F 2 (s), • • • , F n (s), then the Mellin transform of the 
random variable 17 = £i£a • • • £» is G(s) = Fi(s) F 2 (s) • • • F n ($). This relation¬ 
ship is fundamental and justifies the introduction of Mellin transforms in 
studying products of independent random variables. 

From ( 8 ) it is clear that we can find the p.d.f. g(y) of the random variable 
77 which is the product of two positive independent random variables £1 and £2 
with continuous p.d.f.’s fi(x) and f 2 {x) . In fact, by the Mellin inversion formula 

1 -C+*,«0 1 fC+t'OO 

( 9 ) g(y) = 2^. J ^ ^ y~’G(s ) ds = —. ^ y~‘ F^Ftis) ds, 


4 See footnote 3. 



TABLE 1 


APPLICATION OF MELLIN TRANSFORM 


373 




374 


BENJAMIN EPSTEIN 


where the path of integration is any line parallel to the imaginary axis and lying 
within the strip of analyticity of G(s). As in the case of characteristic functions, 
it can be shown that there is a one-one correspondence between p.d.f.’s and their 
Mellin transforms. Therefore, it follows that the p.d.f. g(y) computed in this 
way must be precisely equal to 

(10) 9(y) = l \ h (i) Mx) dx = l (t) fi(x) dx - 

It is easy to verify this directly by showing that the Mellin transform of the 
right-hand side of ( 10 ) is Fi(s) F 2 (s) [ 6 , p. 52], but this will not be done here. 
The essential point is that Equation (9), (which is sometimes easier to evaluate 
than Equation (10)), is a consequence of an algebraic formalism which is 
capable of revealing relationships which would otherwise remain hidden. 

t 

The p.d.f. h(y) of rj = ~ , the ratio of two positive random variables with 

s2 

continuous p.d.f.’s, can be reduced to finding the p.d.f. of the product of inde¬ 
pendent random variables $i and 7 . If Fi(s) and F 2 (s) are the Mellin transform 

£2 

corresponding to {1 and & , respectively, then by (ii) F 2 ( — s + 2 ) is the Mellin 

transform of 7 and, therefore, the Mellin transform H(s) of 77 = 7 is Ei(s) F 2 
£2 £2 

(—s + 2 ). Therefore, the p.d.f. h(y) of rj is 

■t -e+t,oo 1 pc+i, 00 

(11) h(y) = / y~‘H(s) ds = ± / y-'F^F^-s + 2) ds. 

ZlFl Jc—i ,oo ZTTZ J c—t',oo 

This formula is useful in finding distributions such as Student’s t and Fisher’s z. 

4. A modified Mellin transform procedure for finding the distribution of the 
product of independent random variables which are not everywhere positive. 

Up to this point we have limited ourselves to the application of the Mellin 
transform to finding the distribution of the product or ratio of two positive 
independent random variables. While it is true that a number of interesting 
probability density functions are defined only for positive 6 values of the argument, 
it is certainly desirable that we be able to treat situations involving random vari¬ 
ables capable of taking on both positive and negative values. A simple device 
for extending the Mellin transform treatment to the more general problem is to 
decompose the p.d.f.’s fi(x) and f 2 {x) of the independent random variables 
£i and & into 

fi( x ) = fn(x) +/i 2 (aO, 

U(x) = f 2 i(x) + /«(*), 

5 For example, distributions of type 3, the x 2 distribution, the distribution of the sample 
standard deviation and sample variance, the distribution of an even power of a random vari¬ 
able, etc. are all defined only for positive values of the argument. 



APPLICATION OP MELLIN TRANSFORM 


375 


where® 


fn(x) = 0, x < 0, fit(x) = 0, x > 0, 

ft i(x) = 0 , x < 0 , fn(x) = 0 , x > 0 , 

and then to operate on the pairs [fn(x), M(x)], [/ n (a:), fn(x)], [f n (x), fn(x)}, and 
Lfi 2 (x), /sj(x)] separately. More specifically, the frequency distribution h(y) 
corresponding to the random variable 17 = Jifr is made up of the sum of four 
components h(y), hi(y), h,(y), and h,{y). To compute hi(y) one can apply 
the Mellin transform directly to the evaluation of the expression 

hi{y) = [ -fa (-') fn (x) dx, 

Jo x \x/ 

since both fn(x) and f 2 i(x) are zero for negative values of x. The function hi(y) 
is zero for y < 0. To compute K(y) we first evaluate 

ht(y) = r ~fu(^)M-x)dx. 

Jo x \X/ 

Again/n(x) and fn{—x) are zero for negative values of x and, therefore, the con¬ 
ventional Mellin transform.can be applied in determining hf(y). It is clear that 
ki*(y) = 0 for y < 0 and, therefore, h 2 (y) = h 2 *(—y ) = 0 for y > 0. Similarly, 
one can find h 3 (y) and h^y) where h z {y) — 0 for y > 0 and ht(y) = 0 for y < 0 , 
and it is readily seen that 7 

h(y) = hi(y) + hz(y) + hs(y) + h A (y) 

is the desired p.d.f. of rj = £i& . 

5. Examples of use of Mellin transforms in evaluating the product and 
quotient of independent random variables. Example 1 : The distribution of 
rj — ?i? 2 , where fi and fc arc independent random variables with p.d.j. y s fi(x) 
and f 2 (x), respectively , where 

fi(x) = ft(x) = e~ x * 12 , -00 < x < 00. 

In this case 

fi(x) = /u(*) + fn(x), 

and 

fi(x) = f 2 l(x) + /»(*), 


• Of course ,/11 ,/u ,/« , and/-I are generally not p.d.f.’s since J fn (x) dx, fu(x) dx, 

f fn(x) dx, f fa(x) dx are no longer necessarily equal to < 

Jo JLoo 


> one. 


7 As in footnote 6, h ,ht ,h t , and hi are, in general, not p.d.f.’s. 



376 


BENJAMIN EPSTEIN 


where 

fn(x) = 0 , z < 0;fu(x) = 0 , x > 0 ; 
fa(x) = 0 , x < 0;f&(x) = 0 , x > 0 . 

The random variable rj = fcfc has a p.d.f. h(y) = + Wy) + My) 

where 

hi(y) is associated with [/u(a?)» fa(x)], 

My) is associated with [/n(x), / 22 (^)], 

My) is associated with [fu(x ), /*i(aj)], 
and My) is associated with [/i 2 («), fn(x)]. 

It is sufficient to evaluate 

hi(y) = [ - fu (-) fn(x) dx. 

Jo x \x/ 

= l i /jl © /u(x)da5 - 

In this case 

Fn(s) = jf x* _1 /n(x) dx = jf x '~‘ e ~ x,,t dx = r(s/2), 

analytic for Re(s) > 0 
and 

p® oi(«~3) 

Fn(fi) = / z'“72i(z) efo = —= r(s/2). 

J 0 Vt 

Therefore, 

ffi(s) = F.iWFi.W = — r 2 (s/2) 

7 T 

Ai(y) = A- f y~’Hi(s) ds 

vc—1 1 qo 

= ^-. / y~‘ — r 2 (s/2) da, c >0 

Z 7 Tl J e—%,00 IT 

= i tfofo), y > 0 [ 6 , p. 197] 

where Ko(y) is Bessel's function of the second kind with'a purely imaginary argu¬ 
ment of zero order. Similarly 


1 



APPLICATION OF MELLIN TRANSFORM 


377 


Therefore, h(y) = hi(y) + h(y) + h(y) + h t (y) 

= - K<,(y), —oo < y < oo, 

7r 

and this is the desired p.d.f. This result has been found by other methods and 
is given in [ 1 , p. 1 ]. 

Example 2 : The distribution of rj = ~ where £i and fr are independent random 

a 

variables with p.d.f .’sf^x) andfo(x), respectively , where 

fl(x) = Mx) = — 7 =- e -1 *' 2 , - 00 < y < 00 . 

As in Example 1 , one splits the determination of h(y), the p.d.f. of i?, into four 
parts: hi(y), hz(y), h 3 (y ), ^ 4 ( 2 /)- In the notation of Example 1 it is easy to show 
that Hu(s) the Mellin transform of h(y) is 

2 i(«- 3 ) 2 * ( *~ 3) 1 1 

Fn(«)Fn (-8 + 2 ) = —^ r( 8 / 2 ) r (-«/2 + D = 4 —^ ; 

^ ^ oin 


^1(2/) = J y ds > 0 < c < 2, 



1 1 jr* ds 

2iri J c—t,oo 4 . S7T 

s,n 2 



1 1 

2r 1 + j/ 2 ’ 

y > 0 . 

Similarly 

= 2tt 1 +V 2 ’ 

y < o, 


^ = 2 jt 1 +T 2 ’ 

y < 0, 


hi ^ ~ 2v 1 + 11 2 ’ 

y > 0 . 

Therefore, 

= hi(y) + hv(y) + hz{y) + hi(y) 



1 1 

7T 1 + y 2 ’ 

— 00 < y < 00 . 

This result has been found by other methods and given 
Example 3: F-Distribution. Let £ 1 , • • • , £m, yi, • • • 

in [4, p. 411]. 

, tj„ be (m + n) independ- 



378 


BENJAMIN EPSTEIN 


ent random variables, each normally distributed with mean zero and standard 
deviation <r. Let 

£ = It ii, v = Yj >?>•• 

(-1 i-l 

We want to find the p.d.f. h(z) of f where f = £/»;. The p.d.f.’s f{x) and g(y) 
of £ and ij, respectively, are: 


f(x) = 


x ”'*- 1 e - xl2 ‘' 
2 “'V»r(m/ 2 ) ’ 


x > 0, 


2" /2 <r n r(n/2) ’ 


■y > 0. 


In this case 


-V«r (. 


, m ' 

• + i-\ 


Virn/2) 


, analytic for Re (s) > 1 


G(s) = 


(s +1 - t) 


, analytic for Re (s) > 1 


r(»/2) ’ J -- 2’ 

The p.d.f. h(e) has Mellin transform 

H(s) = F(s) G(—s + 2 ) 

r ( !+ |- 1 ) r (- 1 + j + 1 ) 

r(m/2)r(n/2) 

Therefore, 

1 /.c + t.oo 

h{?) = hiL ,- z ~‘ m *’ - ? + 1<c< ? + 1 ’ 

( m + n \ 

^ \ 2 / z ml2 ~ l 

= T(m/2)T(n/2) (e + i)*c-+-» ’ z > °* 

A convenient way of carrying out the inversion is to use formula (d) in Table 1 . 
In a similar way one can find Student’s distribution, i.e., the distribution of 

f = £o/iy, where rj = i/? » and where fo, £i, * ■ * , $n are n + 1 independ : 

ent random variables each having the distribution: 


/(*) = TZThz e 


— oo < a? < oo. 



APPLICATION OP MELLIN TRANSFORM 379 

It should be mentioned in conclusion that the Mellin transform is a natural 
tool to use in situations involving the products and quotients of independent 
uniformly distributed random variables, or in finding products and/or quotients 
and/or Beta-distribution. In such cases formulae (b), (c) and (d) in Table 1 
are useful. 


REFERENCES 

111 C. C. Craig, “On the frequency function of xy,” Annals of Math. Slat., Vol. 7 (1936), 
pp. 1-15. 

[2] H. Cramer, Random Variables and Probability Distributions , Cambridge Tracts in 

Mathematics, No. 36, Cambridge, 1937. 

[3] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[4] J. H. Cxjrtiss, “On the distribution of the quotient of two chance variables,” Annals 

of Math . Stat ., Vol. 12 (1941), pp. 409-421. 

[5] E. V. Huntington, “Frequency distribution of product and sum,” Annals of Math. 

Stat., Vol. 10 (1939), pp. 195-198. 

[6] E. C. Titchmarsh, Introduction to the Theory of Fourier Integrals, Clarendon Press, 

Oxford, 1937. 



THE ESTIMATION OF LINEAR TRENDS 

By G. W. Housner and J. F. Brennan 
California Institute of Technology 

1. Summary. This paper deals with the problem of bivariate regression 
where both variates are random variables having a finite number of means dis¬ 
tributed along a straight line. A regression statistic is derived which is inde¬ 
pendent of change in scale so that a prior knowledge of the frequency distribution 
parameters is not required in order to obtain a unique estimate. The statistic 
is shown to be consistent. The efficiency of the estimate is discussed and its 
asymptotic distribution is derived for the case when the random variables 
are normally distributed. A numerical example is presented which compares 
the performance of the statistic of this paper with that of other commonly used 
statistics. In the example it is found that the method of estimation proposed 
in this paper is more efficient. 

2. Introduction. A problem that often arises in statistical work is the estima¬ 
tion of linear trends. In the general problem it is known or presumed that a 
linear functional relation exists among a set of variables of the form, 

a + biX -f- b%Y + bzZ • • • =0. 

The observed values of the variables are of the form 

Xik — X x + €,*, Vik = Y t + 7]iky etc. 

That is, the X& are random variables with means X { and k = 1,2, • • • Ni observed 
values of x are associated with the mean Xi . The ordering of the X» is according 
to magnitude. Similarly there are the observed values y X k , z ik and so forth. 
The Uk are random variables, with the same distribution for all i, with zero 
means. On the basis of a sample O n (xik ,y%kyZtk f • * • ) it is desired to estimate 
the coefficients a, bi , bt , bz , • ■ • . One method used to estimate the coefficients 
is that of “weighted regression” which is essentially an application of the method 
of least squares. The problem has been studied by R. Allen, A. Wald and 
others. 1 The chief difficulty has been that the proposed methods of estimation 
require an a priori knowledge of the variances of the random variables. Wald 
has proposed a statistic which avoids this difficulty but which may have a rela¬ 
tively low efficiency in cases often encountered in practice. In this paper there 
is described a bivariate statistic which appears to have comparatively high pre¬ 
cision and which does not require prior knowledge of the variances of the random 
variables. A numerical example is given at the end of the paper to illustrate the 
comparative performances of different methods of estimation. 

1 For a brief history of work done on this problem see the paper by A. Wald in the Annals 
of Math . Stat Vol. 11 (1940), p. 284. 


380 



LINEAR TRENDS 


381 


3. The Regression statistic. In the case of the bivariate problem, consider a 
sample 

On(%tk > Vik )> i — 1, 2, * * * , 71 

and 

k = 1,2, ... ,N % , 

where AT* sample values x t , y t are distributed about mean X t , . Let the 

means be related by Y x = a + and let the random variables x % be independent 

and have the same frequency distribution with variance a\ for all i and the ran¬ 
dom variables y t have independent frequency distributions with variance o\ 
the same for all i. An appropriate statistic for estimating b is obtained by noting 
that a pair of sample points ( x tk , y t k ), (x } i , y Jt ) gives a sample value of the 
change in y corresponding to a change in x. It may thus be said that a sample 
value of b is 


( 1 ) 

Making use of the fact that 


Vi k,)l 


Vtk - y ,i 

Xik - Xji' 


( 2 ) 


y%k = a + bx x k + Vtk — bah 


equation (1) may be written 


(Xik Xji) b t k ,ji — {x\k Xji) b -f~ (vxk Vji) b(cik €//). 


Summing this equation over all combinations of points there is obtained 

EEEE (yik — y,i) EEEE ((iik — vn) — *>(«»* — «>i) 

/ *>\ h SB ' 1 _ k _-__ i_ - _-_ 

EEEE ( x ‘k - xu) EEEE (*.* - *,•«) 

% i k i i j k i 

The summations in the above expression are to be carried out for 

l sb 1,2, ••• 9 Nj;k = 1,2, ••• , N t ;j = 1,2, ••• , (» - l);t = 1,2, ••• , n. 

The first term on the right side of equation (3) is an estimate of b and the second 
term represents the deviation of the estimate from the true value. Accordingly, 
we take as an estimate of b the statistic 

EEEE (y.k - vn) 

(4) 6 = tttt^k - ^ ' 

% i k l 

This requires, of course, that the denominator be not equal to zero. Summing 
out the subscripts k and l reduces (4) to 

E E NtNfa - Vi) 

1 = rJLg._ 

E E NtNiixi - xj) 

1 J-l 




382 


G. W. HOUSNER AND J. F. BRENNAN 


where yi is the mean value of the yuc and so forth. Summing out the subscript 
j gives 



This expression may be put in a more convenient form by using the identity 

Z (n. g A r i9t ) = t (N t9t ± A r ; ) = t (nm (± AT,- - Z AT,)). 


With this substitution equation (5) becomes 


( 6 ) 


Z r Niti (Z v, - 2 t, Nj + N,) 
h = v-jlL _V_L_i_ L 

Z (z N, - 2 i: N, + AT.) 


This is the statistic for estimating the linear trend of bivariate data. It may be 
noted that its derivation is not based on the notion of fitting a line to the sample 
points. A line y = d + bx may be fitted to the sample points by making it pass 
through the mean of the sample points, that is, by using the following estimate: 


d — y — bx 


where y and x are the means of all the yu and x tk respectively. 


4. Consistency of the estimate. Having established the statistics b and d it 
is desirable to examine the consistency and efficiency of the estimates, particu¬ 
larly for h. To determine that b is a consistent estimate we investigate the 
behavior of (6) as the number of sample points increases, that is, as the Ni —> «. 
We wish first to establish the following identity. Consider the sum of the 
following array of terms: 

Ni(Ni + N 2 + • • • + N n ) 

A 2 (AT! + N 2 + . • . + An) 


Nn(Nl + Ni + • • • + Nn) 

n n 

The sum may be written Ni ^ N } . Since the array is skew symmetrical the 
1 1 

n i 

expression 2 2 £ Ny also gives the sum of the array except for the fact that 

i i 

the terms along the principal diagonal are counted twice. We have, therefore 
Z Ni E Nj = 2 Z Ni Z Ni - Z N\. 

ii ii i 



LINEAR TRENDS 


383 


Rearranging terms we obtain the identity 

(7) £ [iVi (£ Nj , - 2 £ AT, + iV^J = o. 


Now substituting (2) into (6) and making use of (7) there is obtained, 
£ [n, (£ AO - 2 E Nj + AT.) (ij, - &«,)] 
E (E AT, - 2 £ AT, + AT,) x<] 


( 8 ) 


6 = 6 + 


The and c, are random variables with zero means so that as N t —> oo the sample 
means fj, and converge in probability to zero. As AT* —> oo, x, converges in 
probability to its mean Xi . In view of (7) and that the denominator in (8) 
is not equal to zero the last term in (8) converges in probability to zero and t> —> b. 
The estimate is therefore consistant. A similar argument also shows the estimate 
d to be consistent. 


6 . Efficiency of the estimate. A general investigation of the efficiency of the 
estimate b is beyond the scope of this paper. We may note, however, that the 
efficiency of the estimate can be made to depend upon the grouping of the data, 
that is, the optimum efficiency of the estimate may depend upon the omission 
of some of the pairs (t/,* — yn) from the estimate. The maximum efficiency is 
obtained for b when the second term in (3) is minimized. This requires prior 
knowledge of the frequency distribution of the random variables x and y\ how¬ 
ever, in applications a recognition of (3) may often indicate a practical method 
of increasing the efficiency. 

In what follows we make an investigation of the precision of the estimate h 
for a special case which is of some practical interest. Let x and y be random 
variables as defined in the first part of the paper and consider the new variables 

defined by h = - that is, 
u 

u = E [at, (t N ,■ - 2 E a Tj + AT.) Xi] 

V = E [at, (t AT, - 2 E Nj + ATi)j/i]. 

The random variables u and v are then independently distributed with joint 
probability element f(u) f(v) du dv. Making the change of variable u = r cos 0, 
v = r sin 6 the probability element becomes /(r, d)dr dd where tan B — u. Integrat¬ 
ing out the variable r gives the probability element for 0. In what follows we 
investigate the distribution of 0 for the case where x and y are normally dis¬ 
tributed with the same variance. Since u and v are linear functions of x and y 
respectively they are also normally distributed with the same standard deviation. 



384 


G. W. HOUSNER AND J. F. BRENNAN 


We designate the means of u and v by mi and m 2 respectively and the standard 
deviation by <r. The probability element in u and v is then 

(9) ~ 2 exp j- [(w - m ,) 2 + (v du dv. 

Changing variables to r, 0 and setting mi = f cos 0, m 2 = f sin 0 we obtain the 
following probability element: 

-— 0 exp < — [(r cos 0 — f cos 0) 2 + (r sin 0 — f sin S)]\dr dd. 

2 to 2 2 J 

Completing the square in r and substituting 0 = 0 — 0 there is obtained 

(10) ex P | - 2^2 ( r - f cos ^) 2 | ex P -I | 

To integrate out r make further change of variable 


t = - r 
a 


cos 0. 


Setting - cos 0 = w for convenience in notation there is obtained 


a- 


w 


<ex P)-o( + 2^ ex P 


4}) 


The variable t is to be integrated out of this expression. The corresponding 
limits of integration are exhibited by 


( 12 ) 


+ 


t 

— exp < 
w 

o 

1 

f+“ 

\/27r J 

1 exp 

—w 


{» 


d<(>. 


Now as the number of points in the original sample increases the value of f 

(T IT 

also increases and as - —> 0, with | 0 | < -, the value of w —* oo. In this case 
then ( 12 ) approaches asymptotically to 

As <r/f —> 0 this distribution shows that 0 converges in probability to zero and 
that the distribution approaches asymptotically to the normal form 

(I3) 

It is required then to examine the conditions under which <r/f assumes small 
values. If the variance of the original variables Xi and y* is designated by o\ 



LINEAR TRENDS 


385 


then since u and v are linear functions of x» and y x respectively the variance of u 
and of v is 


Now f 2 is the sum of the squares of the means of u and v so that 
(15) f = (1 + b 2 ) {£ [n, (t,N,-2t,N ] + N^j Z,]| 2 


Dividing (14) by (15) we obtain 
(16) 




Inspection of (16) indicates that as the number of sample points N x increases the 

value of decreases rapidly. To illustrate this we examine some particular 

cases. Consider first the case of four equally spaced means X x = 3i<n , 
(i = 1, 2, 3, 4) and let there be one sample point for each mean ( N t = 1). 
With these values there is obtained, 



0.022 
1 + 6 2 ' 


For b = 1 the range —9° < < +9° includes 95% of the population defined by 

(13). As the number of points N x is increased or as the number of means X x 

is increased the value of (^\ decreases rapidly. Consider now eight equally 


spaced means X x = 3»<n , (i = 1,2, • • • ,8) with again one sample point for each 
mean (N % =1). With these values there is obtained 



0.00045 
1 + 6 2 * 


For 6 = 1 the range — 1 0 < <t> < +1° includes 95% of the population defined 

by (13). . 

It is clear that a very high degree of precision is obtained with the estimate 6 
when there is a considerable number of sample points. However, this will also 
be true in general of other statistics and it is really of interest to compare pre¬ 
cisions in those cases where the statistics have a relatively low precision. A 
detailed comparison is beyond the scope of this paper. However, a direct com¬ 
parison can be made very easily in the particular case when x x is a fixed variate 



386 


G. W. HOUSNER AND J. F. BRENNAN 


and only yi is a random variable. For the sake of simplicity, let each Ni 
then the statistic for estimating b is 

n n 

12 i(y< - v) 12 V& - i) 

(17) b = 4-4- 

12 *’(£* — x) 12 x,(i — i) 

1 1 

Since b is a linear function of the yi by a well known theorem its variance is 


(18) 





1 


The customary least squares regression line of y on x gives for the estimate of b 
and its variance 


X y&i - x) 

bn = -- 

y. Xi(xt — x) 



In the particular case when the Xi are equally spaced, x t = ci + d, the estimates 
6 and bn are identical: 


(19) i = h * = ^=T) t v ' {i - l) - 

6. Numerical example. From a practical point of view the case where x and 
y are random variables is of greater interest than where rr is a fixed variate. We 
give a numerical example of this case comparing the statistic b with several other 
statistics. Consider the case where there is one sample point for each mean Xi . 
We shall evaluate the following: 

1 ) . The statistic of this paper which for this case is 

2 yi a i) 

fc-4— 

12 - 0 

1 

2) . The statistic obtained by minimizing the sum of the squares of the y 
deviations only 

n 

12 Vi(x< - x) 

S, = 4-. 

12 Xi(Xi - *) 



LINEAR TRENDS 


387 


3). The statistic obtained by minimizing the sum of the squares of the orthog¬ 
onal deviations 

11 ( y . - yf -11 (*x - xf 

1 1 

+ L (y. ~ yf - n 2 (z. - xf + 4^2 (y, - jj)(xx - x)j J 

h 3 = n 

Z (y - y)(* - z) 


TABLE I 


Set 

X i 

2/i 

•r* 

2/2 

Xs 

2/i 

*4 

2/4 

i 

i.i 

1.4 

2.4 

2.0 

3 0 

2.7 

3.6 

4.3 

2 

1.2 

1.4 

2.2 

2.0 

3.4 

3.1 

3.8 

4.2 

3 

1.0 

1.4 

1.6 

2.1 

2.8 

3.2 

4.4 

4.3 

4 

0.6 

0.7 

1.8 

2.0 

3.3 

2.6 

3.8 

4.0 

5 

0.7 

1.4 

1 7 

1.7 

2.7 

3.4 

4.1 

4.1 

6 

1.0 

1.2 

1.6 

2.1 

2.9 

2.6 

3.6 

4.0 

7 

1.3 

0.7 

1.7 

2.1 

2 7 

2.9 

4.0 

3.6 


TABLE II 


Set 

bi 


bz 

64 

1 

1.160 

1.068 

1.220 

1.162 

2 

1.056 

1.009 

1.059 

1.027 

3 

0.860 

0.843 

0.803 

0.870 

4 

0.946 

0 896 

0.924 

0.830 

5 

0.875 

0.867 

0 913 

1.000 

6 

0.978 

0.939 

0.981 

0.846 

7 

1.044 

0.959 

1.045 

1.000 

Mean . 

0.990 

0.940 

0.996 

0.962 

7 X Sample Var. 

0.0686 

0.0373 

0.1058 

0.0834 


4). The statistic proposed by Wald 2 

n/2 n 

11 y. - 23 yx 

t 1 n/2 

= n/2 -«- * 

L x. — H x, 

1 n/2 

We apply these statistics to sample data having four means X % — i and y< ** 
i , (i = 1,2,3,4). By means of a table of random numbers seven sets of data were 


* Loc. cit. 




388 


G. W. HOUSNER AND J. F. BRENNAN 


obtained, each set having one sample point corresponding to each mean. These 
sample points are described by Table I where it will be noted that the sample 
points were drawn from a discrete distribution. The estimates obtained from 
the four statistics are exhibited in Table II. 

If the 28 sample points are treated as a single set of data and the four statistics 
in their appropriate forms are applied, there is obtained the following set of esti¬ 
mates: 

hi hz S 3 bi 

0.9768 0.9183 0.9786 0.9496’ 

The preceding computations show that the estimate hz is inferior to the other 
estimates, as would be expected. The estimate S 3 is most accurate when the 28 
sample points are treated as a single set of data with the estimate hi being only 
very slightly less accurate, Si = 0.9768 as compared to S 3 = 0.9786. When the 
individual sets of sample points 1 to 7 are considered it is seen that the estimate 
Si is most accurate with the estimate S 3 rather less accurate; the estimate Si is 
more precise than S 3 , the sample variances being in the ratio 0.0686 -f- 0.1058 = 
0.65. From a practical viewpoint we may also point out that the computation 
of Si requires very much less labor than the computation of S 3 . 



ON THE EFFECT OF DECIMAL CORRECTIONS ON ERRORS OF 

OBSERVATION 


By Philip Hartman and Aurel Wintner 
The Johns Hopkins University 

1. Summary. Let t be the true value of what is being measured and suppose 

that the error of observation is a symmetric normal distribution of standard 
deviation <r. The “rounding-off” error due to the reading of measurements to 
the nearest unit has a distribution and an expected value depending on t and a. 
It is shown that, for a fixed cr > 0, the expected value of the decimal correction, 
r{t\ a), is an analytic function of l which is odd, of period l, positive for 0 < t < i, 
and has a convex arch as its graph on 0 ^ ^ Furthermore, if 0 < t < £, 

both r(t\ a) and its maximum value, Max r(t\ cr), are decreasing functions of a. 

t 

2. Introduction. Let X be an error of observation and let <f>(x) denote the 
density of probability of the distribution of X. In particular, 

/ -f oo 

0 (x) dx = 1 , where (f>(x ) ^ 0 . 

00 

If t is any fixed number, the density of probability of the distribution of 
X + t is <f>(x — t). 

Besides the “instrumental error of observation”, X , there is another error, that 
of the “rounding-off”, which is carried along in the registration of the measure¬ 
ments. It is introduced by the circumstance that, if • • • , h, a are digits, and if 
b denotes the last digit considered, then decimal fractions such as • • • ba and 
* • • ba • • • are registered as • • • b if a < 5 and as • ■ • (6 + 1) if a > 5. Let 
the unit, in which the measurements are expressed, be so chosen that the first 
digit neglected becomes the first digit following the decimal point, i.e., that the 
error of the “rounding-off” is between Then, if t denotes the true value of 

what is being measured, the remark made after (1) sho^s that the probability that 
the error of the decimal corrections be less than x is given by 

oo /• n—J+x 

]£ / <f>(u — /) du, 

n—so Jft-J 

if | x | ^ whereas this probability is 0 or 1 according as x < — \ or x > 
Since the last series can be written in the form 

• r “Hi f-hr « 

( 2 ) 2 J / ^(w + n — 0 du = 2 j <t>(u + n — t) du , ( <f> ^ 0 ), 

n —00 J-i J- { n—oo 

it follows that the density of probability of the error due to the decimal correc¬ 
tions is 

' 00 

(3) 2^ <j)(x + n — t) if | x | < £, and 0 if | x | > £. 

Timm— 00 


389 



390 


PHILIP HARTMAN AND A UREL WINTNER 


Consequently, if r = r(t) denotes the expected value of the decimal error induced 
on the “true” value , t , of the observations } then 

c °° 

(4) r(t) = / x Yh 4>(x + n — t) dx. 

n—oo 

Formula (4) is known 1 . It is usually based on its intuitive interpretation which 
results if, on the one hand, (4) is written in the form 

(5) r(i) = f s(x)<t>(x — t) di, 

J- 00 

where 

(6) s(x) = rr if —i x ^ and s(x) = s(x + 1), — «> < x < oo, 

and, on the other hand, the periodic function (6) is thought of as representing the 
uniform distribution of the error of “rounding-off” over the arithmetical continuum 
over a period, 

|x - n| < §, (n = 0, dtl, •••)> 

on the x-axis. Needless to say, the specification of s(x) at the points x = n + £, 
which are disregarded in the definition (6), is immaterial, since s(x) occurs in 

(5) only as an integrable weight-factor, isolated values of which do not influence 
the integral. 

It follows at once from (1), (5) and the continuity (almost everywhere) of 

(6) , that r(t) is continuous. 

3. Fourier analysis of r(t). Since the Fourier expansion of the periodic func¬ 
tion (G) is 

(7) s(x) = —tT 1 X) (—1 ) n rT l sin 2rnx = s(x =fc 1) = • • • , (| x | < £), 

n—1 

it follows from (5) that 2 

ce -oo 

(8) r(t) = —tt” 1 2 ( —l) n w _1 / 0(x) sin 2irn(x + t) dx . 

n—1 J—oo 

Hence, if the sine in (8) is expressed in terms of 2mx and 2imt, 

oO 

(9) wr(t) = — cos 2irnt + b n sin 2irnt), 

1 F. Zernike, “Wahrscheinlichkeitsrechnung und mathematische Statistik, ” Handhuch 
der Physik , Vol. 3 (1928), pp. 476-476. 

* In view of (1), the term-by-term integration leading from (5) to (8) is justified by the 
fact that the partial sums of the series (7) are uniformly bounded. Correspondingly, the 
above deduction of (9) and (10) from (4) is equivalent to an application of Poisson's summa¬ 
tion formula. In this regard, cf. A. Wintner, “The sum formulae of Euler-Maclaurin and 
the inversions of Fourier and Mdbius,” Am. Jour, of Math., Vol. 69 (1947), pp. 685-708, 
the end of §1 (p. 687) and its application on p. 697. 



EFFECT OF DECIMAL CORRECTIONS 


391 


where 

(10) b n + ia n = / (j>(x) exp ( 2irinx ) dx , (n = 1, 2, • • •). 

J— 00 

Let it be assumed that positive and negative errors of observation, when of the 
same magnitude, are equally probable, i.e., that <j>(x) = <£(—x). Then (10) 
shows that a„ becomes 0. Hence, (9) reduces to 

(11) r(t) = — ( — l) n (c n /ft) sin 2Trnt, 

n = l 

where 

(12) = it"' 1 f 4>{x) cos 2irnx dx = 27r _1 [ . 

J-oo Jo 

Clearly, r(0 is an odd function whenever the density <f>(x ) is even. 

4. The normal case. Suppose that <f>(x) is the density of a symmetric normal 
(Gaussian) distribution. Then, if a is the positive constant representing the 
standard deviation of the errors of observation, 

(13) <f>(x) = (27ro- 2 )“* exp (—\x /a 1 ) (0 < <r < oo). 

It is clear from (5) and (6) that 

(14) r(t) —> s(0 if <r —» 0 in (13). 

Actually, all that (14) says is a triviality, according to which the total error 
becomes the decimal error when the measurements become infinitely sharp. 
In this limiting case, that is, if r(t) — s(t), it is seen from (6) that the graph of the 
periodic function r = r(t) is piecewise linear, and therefore discontinuous. 

If cr = 0 is replaced by 0 < a < «>, the jumps of r(t) at t — n — J disappear 
(cf. the end of §3) and, as will be proved below, 

(I) r(t) is an analytic function which is odd, of 'period 1, and positive for 0 < t < £ 
(hence negative for — \ < t < 0), and 

(II) the graph of r = r(t) over the fundamental interval 0 S* t ^ \ is a convex 
arch , no matter what the value of a in (13) may be. 

Since r now depends both on the “true” value, t, of the observations and the 
“precision”, <r, of the measurements, let r be denoted by r(t ; a). It will be shown 
that 

(i) Max r(t; <r), where the Max refers to t while a is fixed, is a decreasing function 
of a, where a varies on the half-line 0 < a < oo ; and that, on the same half-line, 

(ii) r(t; a) is a decreasing function of a at every fixed t contained in the funda¬ 
mental region 0 < t < $. 

All of this seems to be clear for physical reasons. Actually, it is easy to give 
examples of distribution laws distinct from (13) for which the above assertions 
become false. 



392 


PHILIP HARTMAN AND AUREL WINTNER 


5. The ^-function. As is well-known, 

/ exp ( — \x 2 /<j 2 ) cos ux dx = (27r<7 2 )* exp (— J<r 2 u 2 ). 

00 

Hence, the value of the integral (12) is <j n \ if q is an abbreviation for 
(15) q = exp( — 2 tV). 

Consequently, if r(t } q) is defined, in terms of the above r(t ; <r), by placing 

(16) r(t , q) — r(/; <r) in virtue of (15), 


then (11) shows that 3 


(17) r(t , q) = — x 1 23 (—1 Y n 1 q n% sin 2^ 

n = l 

It will be noted that the range, 0 < cr < qo , of the standard deviation is mapped 
by (15) on the range 

(18) 0 < q < 1, 

and that a decreases or increases according as q increases or decreases. 

Let partial differentiations with respect to t and q be denoted by primes and 
subscripts, respectively: 

(19) S' = df/dt, f, = df/dq. 

Thus, from (17), 

(20) r'(t, q) = -2 £ ( —1 ) n q n * cos 2 t rnt 

n = 1 

and, as easily verified from (17), 

(21) r q (t, q) = (- 4ir qY X r"(t, q). 

Let 6(t , q) be defined by 

00 

(22) 8(t, q) = 1 + 2 23 Q n * cos nt 

n— 1 

(so that B(t , <?) is, in the main, the elliptic theta-function usually denoted by 
tf 3 ). It is known that 

(23) e(t, q)> 0 
and that 4 

(24) 0'(£, g) < 0 if 0 < l < r (hence, 0'(t, q) > 0 if — ir < t < 0). 

The above assertions will be deduced from these facts. 


* Cf. F. Zernike, loc. cit. 

4 For a simple proof, cf. A. Wintner, “On the shape of the angular case of Cauchy’s dis¬ 
tribution,” Annals of Math. Stat ., Vol. 18 (1948), pp. 589-593, §6. 



EFFECT OF DECIMAL CORRECTIONS 


393 


6 . Proof of (I)—(II) and (i)-(ii). First, it is seen from (17) and ( 22 ) that 
(25) r'(f, (?) = 1 - 0(2 rt - tt, q). 

Hence, 

(2G) r"(/, q) = — 2^(2^ — 7 r, 5 ). 

If (20) is compared with (24), it is seen that 

(27) r"(/, q) < 0 if 0 < t < i (hence, r"(/, q) > 0 if — \ < t < 0). 
Consequently, (I) and (II) follow, since, in view of (17), 

(28) r(±i, (?) = 0 = r(0, q). 

Next, (21) and (27) imply that 

(29) r q (t, q) > 0 for 0 < / < 

Hence, (ii) follows from the fact that q is a decreasing function of a. 

As to (i), let / = /((/) denote that (unique) /-value on 0 < / < i at which 
r(/, (?) assumes its maximum value, say r q ; so that 

(30) r q = r(t(q), q), (0 < i(q) < J). 

Clearly, / = t(q) is the only /-value on 0 < Z < \ for which 

(31) r'(Z, q) = 0. 

Since r'((, q) possesses continuous partial derivatives with respect to t and (?, 
and since (27) implies that its partial derivative with respect to /, namely, r"(Z, q ), 
docs not vanish at / = /((/), it follows that the solution / = /(<?) of the equation 

(31) possesses a continuous derivative. Hence, the function (30) possesses a 
continuous derivative with respect to (?, namely, 

(32) * = r'(t(q), q) + r„(t(q), q). 

But since t = t(q) is a solution of (31), the identity (32) can be reduced to 

* = 1 \(t(q), q), (0 < t{q) < §). 

Consequently, (i) follows from (29), since q is a decreasing function of o\ 



WEIGHING DESIGNS AND BALANCED INCOMPLETE BLOCKS 

By K. S. Banerjee 
Pusa, Bihar , India 

1. Introduction. Following a paper by Hotelling [1 ] on the weighing prob¬ 
lem, Kishen [4] and Mood [2] furnished generalized solutions. This note consists 
of some additional remarks on the weighing problem when the weighing is re¬ 
stricted to be made on one pan. 

Hotelling remarked that 'when the problem was to determine a particular 
difference or any other linear function of the weights, a different design should 
be sought to minimize the variance. An account of efficient designs of this kind 
has also been furnished in this note. The notations used by Hotelling and 
Mood have been used here. 

2. Chemical balance problem. It has been shown by Mood that when 
N = 0 (mod 4), an optimum design exists if a Hadamard matrix H N exists, and 
is obtained by using any p columns of H N . When N ss i (mod 4), (t = 1, 2, 3), 
very efficient designs are obtained either by adding to or deleting from the rows 
of Hak , making the resultant number of rows equal to N. 

It has further been shown by Mood in connection with this class of designs 
that arrangements 1 are available which are more efficient than the one obtained 
by repeating the row of ones. As a matter of fact, if any row other than the row 
of ones be repeated, this will lead to a design of the same efficiency as in the case 
of repeated addition of the row of ones; for, the determinant of X'X will remain 
exactly identical. That this is so, will be clear from the following properties 
showing the connection of the matrix X with the determinant | a tJ |: 

(i) Any two rows of the matrix X can be interchanged without changing the 
determinant | a xi |. 

(ii) Any two columns of the matrix X can be interchanged without changing 
the determinant | a XJ |. 

(iii) The signs of all the elements in a column of the matrix X may be changed 
without changing the determinant | a tJ |. 

3. Spring balance problem. Mood has exhaustively discussed the designs 
when N > p. Efficient designs under this class will, however, be available from 
the arrangements afforded by balanced incomplete block designs discussed in 
[3]. These designs will be represented by certain of the efficient submatrices of 
the Pk of Mood. 

Usually v and b are used to denote respectively the number of varieties and the 
number of blocks in the above mentioned designs. Here v will take the place of 

1 This had been independently shown by me before the paper of A. M. Mood was brought 
to my notice by H. Hotelling. 


394 



WEIGHING DESIGNS 


395 


p, the number of objects to be weighed and b that of N, the number of weighings 
that can be made. The matrix X'X in this case will take the form 


rXX- 

• X~ 

Xr X • 

• X 

X X r • 

• X 

XXX • 

• • r_ 


The variance of the estimated weight of each of the p objects for such a design 
can be easily seen to be 


( 2 ) 


r + \(p — 2) 2 

(r - X)jr + X(p - 1)) ° 


for zero bias, 


where p is the number of objects to be weighed and r and X have meanings similar 
to those in connection with balanced incomplete block designs; that is, r is the 
number of times each object is weighed, and X is the number of times each pair 
of objects is weighed together. 

Though the minimum minimorum of a 2 /X can never be attained by the objects 
to be weighed under such designs, <r 2 /iV may however be kept as the standard 
with Avhich the efficiency of a given design may be calculated. The efficiency 
of the above design will therefore for zero bias be 


(3) 


(r — X){r + X(p - 1)1 
N{r + X(p - 2)j 


The identities well known in the theory of balanced incomplete blocks, 


bk = vr , \(v — 1 ) = r(J: — 1 ), 


may, upon replacing b by N and v by p to accord with the notation of weigh ng 
designs, be written 

r = Nk/p , X = r(k — 1 )/(p — 1). 

Upon substituting these in (3) we obtain the efficiency factor in the form 

(4) -J& - *>_ 

' ' p{pk — 2k + 1 ) ’ 


where k is the number of plots per block or the number of objects that can be 
weighed at a time. 


If instead of adopting repetitions of P K , only 


(jc) " eighi 


eighings be made in all, 


the efficiency factor calculated for such a combinatorial design would be 


(r — X){r + \(v — 1)1 
b {r + \(v — 2)} * 


for zero bias. 







396 


K. S. BANERJEE 


where 



. The above expression on simplification reduces to (4). 

It will be noticed that the efficiency of such designs depends only upon the 
total number of objects to be weighed and the number of such objects that can 
be weighed at a time. 

These designs have the advantage that all the weights are estimated with 
equal precision. If a slightly larger number of weighing than what is afforded 
by the number of blocks in a balanced incomplete block design has to be made, 
all the objects may be weighed together and this weighing be repeated as many 
times as required. This will be equivalent to the repeated addition of the row 
of ones. The repetition of the row of ones in particular is necessary to make the 
weights estimable with equal precision, which however, may be demanded at 
times as a matter of necessity in certain experiments. Otherwise, any other 
single row or different rows of the matrix X may be repeated, making the number 
of rows of the matrix X equal to the number of weighings proposed to be made 
in all. 

From the practical point of view also, it will be advantageous to connect the 
designs for weighing with the already existing balanced incomplete block de¬ 
signs, which have been highly developed in recent years and are being extensively 
used in agro-biological investigations. 

4. Spring balance design for small p. Under this class of designs, Mood has 
found the most efficient design for p = 7. It is given by 


and b = (k) 


U = 


10 10 10 1 
0 110011 
000 1 111 
1100110 
0 111100 
10 110 10 
110 1001 


This L 7 is easily recognized to be the design for k = 4, b = 7, v = 7, r = 4 , 
X = 2, given by an orthogonal series [3]. It is therefore seen that Hadamard 
matrices will lead to a new method of constructing balanced incomplete block 
designs of a certain class. For example // 1 6 and Hw will lead respectively to* the 
designs for k = 8, b = 15, v = 15, r = 8, X = 4 (or for k = 7, b = 15, v = 15, 
r = 7, X = 3) and for k = 10, b = 19, v = 19, r = 10, X = 5 (or k = 9, b = 19, 
v = 19, r = 9, X = 4). These designs also satisfy the condition of maximum 



WEIGHING DESIGNS 


397 


efficiency, by virtue of the fact that | L N | will have the value 

(N + l)* ( * +1) /2", 


as shown by Mood. 


6. Determination of a linear function of the objects. An orthogonalized 

design which is cent percent efficient to determine individually the weight of p 
unknown objects is not necessarily the design of maximum efficiency for the es¬ 
timation of a linear function of the objects. To illustrate this, let there be three 
objects, the weights 0 \, O 2 , O 3 , of which have to be estimated on a balance 
corrected for zero bias and let us, for this purpose, concentrate on the design 
characterized by the matrix given below. 


(5) 



As has been indicated in the previous papers, the variance of each of the unknown 
objects comes out to be \a\ which is the minimum minimorum and as such the 
above design enjoys the cent percent efficiency, when the question of individual 
estimation is concerned. But in estimating a linear function of the objects, 
for instance the total weight, designs more efficient than this are available. 

The variance of Wi + W 2 + Wz is known to be 


( 6 ) 



where C t} denotes the elements of the matrix reciprocal to the matrix X'X. 
As the above design furnishes the estimates of the unknown objects orthogonally, 
the variance of the estimated total weight of the three objects will be given by 
f< 7 2 . If, however, the design given by the matrix 


(7) 


t 1 r 
110 
1 0 1 
.0 1 l 


be adopted, the variance of the estimate of the total weight may be easily seen 
to be (3/7)<r 2 , by putting h = k = U = 1. (3/7)<r 2 is evidently less than $<r 2 . 

Therefore with four weighings, the design characterized by (7) is more efficient 
in estimating the total weight than that characterized by (5). A still more effi¬ 
cient design for getting the total weight is simply to weigh all the objects to¬ 
gether four times. 


6. Designs with arrangements afforded by balanced incomplete blocks. The 

necessity for an efficient design to estimate any linear function of the objects 



398 


K. S. BANERJEE 


(or to be precise, say to estimate the total weight) will perhaps arise only when 
the objects cannot all be weighed at a time collectively on a single pan. Here 
also, an efficient design under the supposition that all the objects cannot be 
weighed together is afforded by the arrangements in balanced incomplete blocks. 
In such a design, the diagonal elements in the matrix reciprocal to X'X will be 
all positive and equal to 

A* r+X(p-2) 

(r - X) {r + \(p -1)|’ 

while the remaining elements in the reciprocal matrix will be negative and equal 
to 


(9) 


_-X_ 

(r - X){r + X(p - 1)} * 


Using the generalized form of (6) and admitting of the possibility that any of the 
arbitrary constants U may be negative, the variance of the linear function 
Sf-i LOi may be easily seen to be 


( 10 ) 


n\ _ _ x(zz ,) 2 

r - X (r-X){r+(p-l)X) 


If, however, in the above expression, the coefficients U are equal to 1, (10) is the 
variance of the estimated total weight, and reduces to 


(ID 


r + (p - 1)X 


When there are N weighings in all, the minimum variance that can be reached 
is <r 2 /N and will be attained, it appears, only when all the objects are weighed 
together and the weighing is repeated N times. The efficiency of a given design 
may therefore be calculated with reference to c j 2 /N. Remembering that the 
number of weighings takes the place of the number of blocks and p the place of v, 
the efficiency of the design will reduce to ( k/p ) 2 , where k is the number of plots per 
block i.e. the number of objects that can be weighed at a time. 

If, however, the combinatorial arrangement is adopted weighing all possible 


combinations of k objects and making 



weighings in all, the same efficiency 


as above will be obtained for such a design. 

Given k , the above expression of efficiency will therefore be the deciding factor 
for choice between an arrangement of balanced incomplete block design and all 
possible combinations of k objects. 


7. Design of maximum efficiency. Designs leading to the matrix X'X of 
the type (1) have certain advantages inasmuch as the variances of the individual 
objects are equal, as are also the covariances between all possible pairs. The 



WEIGHING DESIGNS 


399 


variance of the estimated total weight in such a design is given by (11). To 
minimize the variance thus obtained, the expression 

(12) r + (p - 1)X 

has to be the maximum for a given value of p. In an arrangement of the bal¬ 
anced incomplete block type or in an arrangement with all possible combinations 
of 1c objects being weighed at a time, (12) would reduce to rk and would therefore 
increase with the increasing value of rk. This shows that the estimation of the 
total weight will have increased precision if more of the objects are weighed at a 
time. 

If all the objects could be weighed at a time and both the pans be used for the 
purpose, some of the elements in the matrix X will be — 1 instead of 0. This 
would increase the value of r but would decrease the value of X. To devise the 
best possible design therefore, account will have to be taken simultaneously of 
r and X. 


REFERENCES 

[1] Harold Hotelling, “Some improvements in weighing and other experimental tech¬ 

niques”, Annals of Math. Stat., Vol. 15 (1944), pp. 297-306. 

[2] A. M. Mood, “On Hotelling’s weighing problem”, Annals of Math. Stat., Vol. 17 (1946), 

pp. 432-446. 

[31 R. A. Fisher and F. Yates, Statistical Tables for Biological , Agricultural and Medical 
Research , Oliver and Boyd, London, 1938, pp. 10-13. 

[4] K. Kishen, “On the design of experiments for weighing and making other types of 

measurements”, Annals of Math. Stat., Vol. 16 (1945), pp. 294-300. 

[5] C. R. Rao, “On the most efficient designs in weighing”, Sankhyti, Vol. 7 (1946), pp. 440 



BOUNDS FOR SOME FUNCTIONS USED IN SEQUENTIALLY TESTING 
THE MEAN OF A POISSON DISTRIBUTION 1 


By Leon H. Herbach 
Brooklyn College 

1. Introduction. Let z = log { 7 ^-^4, where f(x , X*) = (e~ u X?)/x!, 

J(x, Xo) 

(i = 0, 1), is the elementary probability law of a Poisson variate X, under the 
hypothesis that the mean is equal to X» . Without loss of generality we shall 
assume Xi > X 0 . 

Let Ho be the hypothesis that the distribution of X is given by /(a:, Xo). Wald 
[ 1 , pp. 286—287] has devised general upper and lower bounds for the probability 
of accepting Ho , when X is the true value of the parameter, and the sequential 
probability ratio test is used. This probability is called the operating-charac¬ 
teristic function and is designated by L(\). Using these results he has com¬ 
puted the bounds for the binomial and normal distributions [2, pp. 137-142]. 
We shall do the same thing for the Poisson distribution, since the restrictions 
[ 1 , p. 284, conditions I to III] under which these general limits are valid can 
rather easily be shown to apply to the Poisson distribution, if we make the fur¬ 
ther restriction that E(z ) ^ 0 . 

These general results are 


and 

( 1 ) 


1 - B h 
8A h - B h 


< 1 - L(X) < 


1 - v B h 
A h - v B h 9 


if h > 0 , 


1 -A* 
8B h - A h 


< L(X) < 


1 - v A h 
B h - r,A h 9 


if h < 0 , 


where a, 0 are probabilities of committing errors of the first and second kind re¬ 
spectively and 


( 2 ) 


A = (1 - (l)/a, B = 0/(1 - a) 
V = gib wU' I e ht < ^ 

S = lub pE(i" I e h ‘ > ^ 


r > 1; 


0 < p < 1; 


and h is the non-zero root of the expression, Ee zi = 1 . Hence the only remaining 
unknowns are r\ and 8. 


1 The author is indebted to Professor A. Wald for suggesting the problem which led to 
this note and for helpful discussions. 


400 



BOUNDS FOR SOME FUNCTIONS 


401 


The following bounds to En, the expected number of observations required 
by the sequential probability ratio test defined by a, /3 have been derived [1, pp. 
143-147]: 

L(X) (log B + jO + [1 — L(\)j log A < 

Ez > 

< L(X) log B + [1 — L(X)](log A + £) 

> Ez ' ’ 

the upper or lower inequality signs holding according as Ez > 0 or Ez < 0, where 

(3) i' = Min E(z + r \ z + r < 0), 

r 

and 

(4) £ — Max E(z — r | z — r > 0), (r > 0) . 

r 

Using the limits to L(X), we then find £ and £', which determine En. 

2. Special terminology. By an almost-increasing function we shall mean one 
that has the following properties: If x is any point of discontinuity, then (a) x + k 
is also where k is any integer and x + l is a point of continuity if l is not integral, 
(b) f{x - €) < f{x - e') < fix) for 0 <€'<€< 1, (c) fix - 1) < fix), id) 
lim«_ 0 /(£ + c) = fix +) < fix), ie) fix — 1 +) < fix +). It is clear that the 
minimum value for fiy) in any closed interval [a, b] is equal to min Lf(a), /(a' +)] 
where a' is defined as a if the closed interval contains no discontinuity, and as 
the leftmost point of discontinuity otherwise. As special cases, if a is a point of 
discontinuity this minimum is/(a +) and \ix<a<b<x-\-l the minimum 
is/(a). 

Almost-decreasing functions are defined similarly except that the inequalities 
go the other way. In this case the maximum in the interval is max[/(a),/(a' +)] 
and we have special cases as above. 

3. The case h > 0. Since e = a x e~ c , where a — Xi/X 0 and c = (Xi — X 0 ) the 
condition e h * < 1/f may be expressed as ct x e~ ch < 1/f, whence 

(5) x < c /log a — log f/(/i log a) = s — r (say). 

Since x > 0, r < s. Hence 0 < r < s. Also 

(6) Ee ,h = £ (e~ c a x ) h e -^ = exp (-ch - X + \a), 

ImO X\ 

and 

(7) sE(e zk \ e !h < 1/f) = m(e~ c a x ) 1 ' \x < s - r]. 



402 


LEON H. HERBACH 


From (5), f *= and (7) becomes 


(7.1) 


[ a — r ] -X x 

^ e \ -ch xh 

1* — r e a 
*-° x 1 

I y ^ 1 e“ x X* * 

nS x! 


where [s — r] is the largest integer < ($ — r). Our problem is to minimize (7) 
with respect to f. Since r is a strictly increasing function of f, this is equivalent 
to minimizing a rh C/D = 9 (say) with respect to r, where 


C = 




\ x a* 1 ' 
xl ’ 


and 


D = 




xl ’ 


It will be shown that (7.1) is an almost-increasing function of r and therefore 
the minimum occurs at either r = 0 or r = v +, where v = s — [s], since the 
saltuses occur at r — v + k for A; = 0, 1, 2, • • • , [s]. 

Since a rk is an increasing function of r and C/D remains constant as long as 
[s — r] remains constant, condition (b) is fulfilled. 

Conditions (c) to (e) refer to the saltuses only, hence, to show them, we may 
assume, without loss of generality that r and s are integral. We proceed by in¬ 
duction, using the notation 9(w) to mean the value of 0, when r = w, to show (c). 
First we prove the following: 

Lemma A. 0(s) > 9(s — 1). 

Proof: Since we assumed \i > \ 0 and h > 0, a > 1. Hence (1 + \)a > 
1 -f- \a , whence, a fortiori, a* h > a {8 ~ 1)h ( 1 + \a)/(l + X). 

To show that if 0(r + 1) > 9(r), then 0(r) > 0(r — 1), we shall show that 

(8) CD + Dba in+1)h < CDa h + Cb 
implies 

(9) * CD + Dbqa (n+1)h < CDa 4- Cbqa\ 
where n = s — r, b = X n /n!, q = X/(n 4- 1). 


Since, as we shall see below, 

(10) Dba in+1) \q - 1) < Cb(qa h - 1), 


or 

(11) Da' n+l) \q - 1) < C(qa h - 1), 

addition of (8) and (10) yields the desired result, (9). 

It now remains to prove (11) or that 


( 12 ) 



BOUNDS FOB SOME FUNCTIONS 


403 


Setting (6) equal to 1 we get \a K = ch + \, which when substituted in (12) 
yields 

(eh + X) B+1 (X - » - 1) £ < \ n+1 (ch + X - n - 1) £ 6* + X) * . 

9-0 x! Xmm Q x\ 

Upon letting p = ch + X, we have 

X - (w + 1) vX* ^ p - (n + 1) ^ p* T?r N 

—x^r— hxi < —— Ssi = Fip) ’ say ' 

Then our problem reduces to showing that F(y) is increasing inO<X<y<p 
or that the derivative with respect to y , F' (y) is positive. 

™ - - E ( ” - +(»+« E t? - + (. + .)•»- 


>(n + 1 fy 


'iS ) (x+1)! 

since (n + 1) > (z + 1); 
since y > 0. 


Thus condition (c) is demonstrated. To show (d) we must show that 
0(r +) < 0(r), which means that 

rkC ~ ba nk rh C 

a -D^T <a D ’ 

But this is true if C < Da nh which is easily verfied. Condition (e) is equivalent 
to showing that 

(r—1) A C rh C — ba nh 

° D <a ~D=T’ 

which is proved just as (c) was. 

Hence, 


7) = mm < e 


[*] hx / [i] —X\* 

z-0 X! / x—0 


[»-ll -X \ x hx / [«-l] —A v x 

o'V* £ e —V- /Z--r 

x.o x I / x-o x! 


As special cases we have (i) if s is integral, rj is the latter with v = 0 and (ii) 
if s < 1 (b) is the only applicable condition and we have an ordinary increasing 
function, hencd rj is the former. 

Similarly, it may be shown that 

(14) S = max [e~ ck E{a xh \x > {a}), \x > {a + 1})], 

where {s} is the smallest integer > s and y = {s} — 8. Here there is only one 
special case, namely (i). If h < 0, 8 is the larger of the two expressions on the 
right side of (13) and rj is the smaller of the two corresponding expressions in 
(14). 



404 


LEON H. HERBACH 


4. Since z = — c + x log a, £ may be written 

Max log aE(x — t | x > t), 

t 

where t = (r + c)/(log a). Hence s = c/log a < t < «. Therefore if we can 
show that E(x — t | x > t) = y(t) (say), is an almost-decreasing function of t 
we will know that { occurs either when t = s or {s} + since, as will be seen, the 
jumps occur at integral t. 

To show (c) we make use of the following which is easily proven: 

Lemma B. Let X, F, Z each be greater than zero . Then a necessary and suffi - 

X y i y 

dent condition that < - .-— - is that XZ < F 2 . 

Therefore, to show for integral t that 
(15) y(t) < y(t - 1), 

or that 


00 X X 00 xX 00 1 Z 

Z (* - 0 ~j Z (* - 0 - T + Z —, 

x— t X! . z— t X! x— t X! 

oo Tr ^ oo x x x t—1 > 

y A y A . A 

a*i + (*- i)i 


we need only show that, for all integral t , 

(16) (<-i)iS x! < LS^u ■ 

Since both sides of (16) are power series in X where the exponents start with 2 1 
we need only show that the coefficient of every term on the left is less than the 
corresponding term on the right. 

In the case of the coefficient of \ 2j+2 *, (j > 0) we have to show that 

_?L+_L_ <_ 2 |_ 2 _ _ 

(t + 2 j + 1)!(< - 1)! {t + 2 j)Ul T (t + 2 j ~ !)!(< + 1)! 


(t + m+j )! 


or by multiplying both sides by (2 1 + 2j)! that 

(21 + 2 j\ (21 + 2j\ (2t + 2j\ / 2t + 2 j \ 

® +i) (<-.M . ) + 2 (« + i) + - + Vi-i) 

(2t + 2j\ 

+ ( Kt+j ) = M, 6 *y. 

Replacing all the binomial coefficients on the right by the smallest one we have 


+ ... + 2 



BOUNDS FOR SOME FUNCTIONS 


405 


since 


C-i) < C) forn ^ 2s - 


Thus the truth of (16) has been established 


for even exponents. The odd terms are treated similarly. 

Hence, we have shown that y (t) is a strictly decreasing function of t, if t takes 
on integral values only. We shall now show (b), i.e. that 


(17) 


7(0 


Z (x - fl 


X— t 


x\ 



< 


t (*-< + 

*-( <-«} 


oo 

E 


X^ 

x\ 


€) 


X" 

x\ 


y(t — «). 


The denominators are equal and each term of the numerator on the right is 
greater than the corresponding term on the left, hence (17) is valid. 

Conditions (a) and (d) can be shown, by showing in a similar manner, that 


(18) 


y(t -f) = 14- y(t 4- 1) 


and y(t) > 1 + y(t + 1) for integral t. By using (18) for t and t — 1 together 
with (15) we showy(J — 1 +) < y(t +), which is condition (e). Thus we have 
shown that 


f 


max 


00 ~\ x —* / °° 
-c + log a E - T - /z 

*-l*l x! / x-|« 


log 


a — 




x\ x e 




XV 
u I z! 

k / 00 

- E 
/ *-{•+ 


{•+ 1 } 




As in Section 3, £' is the lower analogue of £, i.e. 


£' = min {— c 4- #(£ | x < [s]), — [s] log a 4- E(x | x < [s — 1])}, 


and the special cases are as in that section. 


REFERENCES 

[1J A. Wald, “On cumulative sums of random variables,” Annals of Math. Stat ., Vol. 15 
(1944), pp. 283-296. 

[2] A. Wald, “Sequential tests of statistical hypotheses,” Annals of Math. Stat., Vol. 16 
(1945), pp. 117-186. 



NOTES 

This section is devoted to brief research and expository articles and other short items . 


THE DISTRIBUTION OF STUDENT'S t WHEN THE POPULATION MEANS 

ARE UNEQUAL 

By Herbert Robbins 

Department of Mathematical Statistics } Universtiy of North Carolina 

Let X\ , • • • , Xu be independent normal variates with the same variance a 2 
and with means pi, pn respectively. Set n = N — 1 and let 

(1) x = X x % /N, s 2 - ^2 ( x % — xf/n , t — N^x/s. 

i i 

If all the pi are 0 then t has Student's distribution with n degrees of freedom; its 
frequency function will be denoted here by 

(2) /»,o«) = n" 1 [b Q, I)] 1 • (1 + t 2 /n)~ iln+1 \ 

When dealing with situations involving mixtures of populations or in which the 
mean exhibits a secular trend, it is important to know the distribution of t 
when the p x are arbitrary; in the general case let 

= 12 m/N, = 12 (v< - tf/N, 

(3) i i 

a = NH 2 /2a\ X = Nf?/2 a 2 . 

The distribution of t will be shown to depend on the three parameters n, a , X. 
If X = jS 2 = 0, so that all the p t are equal, then the distribution of t determines 
the power function of the ordinary t test. We shall here consider the case in 
which a — p — 0, although the p t are different. Denoting the frequency function 
of t in this case by/ n ,x(0 we shall show that 

(4) /n,x(0 = /»,o(0 • exp j — fjfcj ' n /2, - X(1 + t/nY 1 ), 

where F denotes the confluent hypergeometric series, and where, since p = 0, 

(5) X = £m?/ 2a\ 

1 

In fact, the general distribution of t, of which (4) represents the case a = 0, 

406 



student’s t 


407 . 


may be derived as follows. Using the standard orthogonal transformation 
[1, p. 387] let 


(6) 

N N 

Zi = Z c *y *y y x i “ Z c ji 2 y 

y-i j-i 

(*' = 1, 

• • •, N), 

where 



(7) 

ci, = 1 NT* 

O' = i» 


then 




(8) 

< = n*Zi / 




The joint frequency function of the 2 t is easily seen to be 
(9) (2x)~ A/2 • <r~ A • exp (-Z — a<) 2 /2<r 2 j, 


( 10 ) ai = N* ji, Ea? = iV/3 2 . 


Thus t is the ratio of a non-central normal variate to the square root of an in¬ 
dependent non-central chi-square variate. It is known 12 , p. 138] that the 

N 

frequency function of q = Z is 




(hM 2 y 


UjlTdn + j) 1 


(ID 

where 

(12) X = E o?/2<7 2 = iV/3 2 ,/2<r 2 . 

2 

The frequency function of i; = Zi/a is 


tffo) = 


1 

\^2ir 


exp 


__ (gg ~~ a 0 2 \ __ * -a -(»2/2) Y' {2a) kl2 k 

2 a 2 f “ - /oil e * 6 iZ M x > 


that of 3 is, by ( 11 ), 
hence that of u =* v/q = ri~h is 


\/2w jfcZo A;! 

K 2 j+n-l 


S2*j!r((n/2)+j)’ 


(q > 0), 


/ h(q)g(uq)qdq> 
Jo 


which, after integration, reduces to 

(13) e‘ (x+ “> t i T{N L 2 |oi- + V /2) d + M 2 )- (w+m+ ‘ ),s . 

£o£o j!fc! rln/2 4- 


r(n/2 + j) 



408 


HERBERT ROBBINS 


In particular, if a = fr = 0 then (13) reduces by means of the relation 
F(or, 7 , x) = e*F (7 - a, 7 , - s) to 


(14) [ 5 G’l) 1 ■ e_x “ ,,u+ “ ,> • d + w2 r* w ~ X(1 + w2)_1 )> 


from which it follows that the frequency function of t is given by (4). 

Again, let Xi , • • • , x Nv +n 2 be independent normal variates with the same vari¬ 
ance a and with means hi > • * * , mati+a^ respectively. Set ni = N\ — 1, 
ri 2 = N* — 1 , n = ni + n 2 , and let 


xi = 22 
1 

(15) s? = 23 Or, - Xif/rh , 

1 

3 2 = (ni3 2 + w**i)/0h + ^ 2 ), 


* l+* 2 

*2 = 23 

* 1+1 


* l ±*2 

sl= 23 (*. - * 2 ) 7*12 

* 1+1 

= [NiNt/iNi + A 2 )]*(*1 - x 2 )/s. 


If all the m are equal then t again has Student’s distribution with n degrees of 
freedom. In the general case let 


(16) 


*1 *l+*2 

mi = 53m./*i, M 2 = 51 m/Ni, 

1 jf 1+ i 

*1 *l+* 2 

0 i = 23 (m»* — Hi) 2 /Ni y dl = 23 (m* — H 2 Y/N 2 . 

1 *1+1 


Then we may show as before [1, p. 388] that in this case u — n H has the fre¬ 
quency function (13), where now 


(17) 


N = Ni + Nt- 1, X - (Nifil + N 28 \)/ 2 a\ 
a = [NiNi/(Ni + N 2 )](mi. - ft)*/*- 


In particular, when a = £1 — £ 2 = 0 , so that £1 = £2 = jJi, say, the frequency 
function/ n ,x(0 of t is again given by (4), where now 

*l +*2 

(18) X = Z U - mW- 


Extensions in this direction to the general linear hypothesis in the analysis of 
variance will not be treated here 
If we set 

(19) w = (1 + f/ n) _l 

where t has the frequency function (4), then w will have the frequency function 

(20) foW - [bQ, e - Ml -“ ) • «+~ l • (1 - w)-' ■ F | , - Xw) , 



student’s t 


409 


for 0 < w < 1. Thus for every t> 

£ t /• (1 

t dx = | 0».x(w) dw. 

It would be interesting to have numerical values of the integral on the left side 
of (21) for that value of t for which 

(22) 1 — / n ,o(z) dx = 0.01 or 0.05 (say), 


but existing tables (e.g. those in [2] and [3]) of the integral of (20) were compiled 
for a different purpose and do not supply this information. The following re¬ 
marks throw some light on this subject. 

Let us set 

R(t) = fnM/fn. 0«) = eXpj-^-j • ? (~5> 2 ’ ~ X(t + e/nyl ) 

(23) = (I “■ X(^ 2 /n)/( 1 + t 2 /n) + o(A)} 

• {1 + A/(n + 4* o(X)} 

= 1 + A(n + t 2 ) ! (1 — t 2 ) + o(X). 

Then as X 0 we have ultimately 


(24) 


R(t) > 1 if | t | < 1, 
R{t) < 1 if \t | > 1. 


Hence for any t > 1 and for sufficiently small X, 


(25) 


1 



fn,\(x') dx ^ 1 



The exact range of values of t for which R{t) < 1 depends of course on n and 
X. However we shall show that always 

(26) R(t) < 1 if 1 1 1 > 1, 

so that (25) holds for all n and X > 0, provided t > 1. The proof i<* as follows. 
In terms of w we have 

(27) R(t) = e* 0 —’-FC-i, n/2, - Atr) = e x F((n + l)/2, n/2, Aw). 


Now 


F((n + l)/2, n/2, \w) = 1 


(n + l)(n + 3)- • • (n + 2 k — 1) /. y , , 
+ h~ «(» + 2):..(n + 2*- 2) (x '° 


( 28 ) 



410 


LOUIS GUTTMAN 


and by induction on k we may show that for all k = 1,2, 
(n + l)(n + 3)* • • (n + 2k - 1) 


^ »(n + 2) • • • (n + 2k - 2) 

where the equality holds only for A; = 1. Hence 


< 1 + k/n, 


(30) F((« + l)/2, n/2, \w) < 1 + £ (1 + */«)• (Xw>)*/*! = e x “(l + \w/n), 

k- .1 

(31) B(t) < e _x(l_ “ ) • (1 + \w/n) < e -x<1 “” ) • e Mn = e -x[1- “ <1+Un) '. 
Hence R(t) < 1 if w < n/(n + 1), which is equivalent to (26). 


REFERENCES 

[1] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1946. 

[21 P. C. Tang, “The power function of the analysis of variance tests with tables and illus¬ 
trations of their use,” Stat. Res. Memoirs , Vol. 2 (1938), pp. 127-149. 

[3J Emma Lehmer, “Inverse tables of probabilities of errors of the second kind ,"Annals 
of Math. Stat., Vol. 15 (1944), pp. 388-398. 


A DISTRIBUTION-FREE CONFIDENCE INTERVAL FOR THE MEAN 

By Louis Guttman 
Cornell University 

1. Summary. Consider a random sample of N observations X \, x 2 , • • • , Xs , 
from a universe of mean n and variance a 2 . Let m and s 2 be the sample mean 
and variance respectively: 

(1) m = A. 2 x,, s 2 = * £ (*. - m) 2 . 

JM »«l iV i-i 

It is shown that the following conservative confidence interval holds for y: 

(2) Prob {(m - n) 2 g s 2 /(N - 1) + \aV2/N{N - 1)j > 1 - X _! , 

where X is any positive constant. Inequality (2) also holds if, in the braces, X 
is replaced by Vx 2 — 1, with X ^ 1. 

Inequality (2) is much more efficient on the average than Tchebychef’s in¬ 
equality for the mean, namely, 

(3) Prob { (m - M ) a g \V/N } > 1 - X" 2 , 

yet (2) and (3) are both distribution-free, requiring only knowledge about <r 2 . 
At the 1 — X“ 2 = .99 level of confidence, the expected value of the right member 
in the braces of (2) is only about 1/6 the corresponding member of (3); at the 
.999 level of confidence the ratio is about 1/20. 



CONFIDENCE INTERVAL FOR MEAN 


411 


A more general inequality than (2) is developed, also involving only the single 
parameter a. 


2. Derivation. Consider the function 

(4) u = (m - nf - s 2 /(N - 1 ) - ca, 

where c is an arbitrary constant. It is easily verified that Eu = — ca 2 , and that 

(5) Eu 2 = a[2/N(N - 1) + c 1 ]. 

A basic feature of (5) is that the only population parameter in the right member 
is a 2 . Contrary to what might have been surmised, the fourth moment of x 
about n is not involved, and indeed need not exist. 

According to Tchebychef’s inequality, 

( 6 ) Prob {- WW 2 ^ u ^ WW 2 } > 1 - X" 2 , 


where X is an arbitrary positive number. Using (4) and (5), it is possible to write 

( 6 ) as: 

Prob {s/(N - 1) + c<i 2 - X(t 2 \/2 /N(N - 1) + c 2 ^ (m - M ) 2 

(7) ___________________ 

^ s 2 /(N - 1) + a[c + W2/N(N - 1 ) + c 2 ] J > 1 - X"*. 

In the braces of (7), if the left member is negative, there is no harm in replacing 
it by zero; if it is positive, then replacing it by zero may only increase the prob¬ 
ability of the braces. Regardless of the value of this left member, it is true that 


Prob {(m — n) 2 g s 2 /(N — 1) 

+ a[c + \V2 JN(N - 1) + c 2 ]} > 1 - X - *. 


If we set c = 0, we have inequality ( 2 ). Some improvement over ( 2 ) is obtained 
by determining c to minimize the right member in the braces of ( 8 ), yielding as 
the shortest confidence interval: 


(9) Prob {(m - ^ s 2 /(N - 1) + u \/2 (\ 2 - 1 )/N(N - 1)} > 1 - X" 2 . 

Inequality (9) differs from (2) only by replacing X in the braces by \/x 2 — 1 . 


3. Comparison with Tchebychefs inequality. The expected value of the 
right member of the braces iif ( 2 ) is 

(10) a[l/N + W2/N(N -1)1. 

The ratio of (10) to the corresponding value of Tchebychefs inequality (3), 
namely xV/A, is 

(11) [1 + W2N/(N -1)]/X 2 . 

Since (11) decreases as X increases, the efficiency of inequality ( 2 ) increases com¬ 
pared with that of Tchebychef as the level of confidence 1 — X 2 increases. The 



412 


LOUIS GUTTMAN 


squared interval of (2) involves only the first power of A, while that of (3) in¬ 
volves the second power. 

4. Approach to normality. If the fourth moment of the universe's distribu¬ 
tion exists, then it is well known that the ratio of E(m — m ) 4 to a/N 2 must ap¬ 
proach 3—the ratio for the normal distribution—as N increases. That is, if 
a + 1 is the ratio, then lim*-,* a = 2. It is known 1 that Tchebychef's inequal¬ 
ity can be replaced by one involving both a and < 7 2 , and that 

(12) Prob {(m - M ) 2 ^ * (1 + \ol)/N] > 1 - X -2 . 

If a 2 = 2, then the right member in the braces of ( 12 ) becomes <r 2 (l + \\/2)/N. 
This is virtually the same as (10), the expected value from ( 2 ). In a sense, then, 
( 2 ) implicitly takes account of the fact that the distribution of sample means 
approaches that of the normal distribution with respect to the fourth moment. 
A striking feature, however, is that ( 2 ) holds for any N > 1 and does not even 
presume the fourth moment of the universe to exist, whereas to set a = y/2 in 

( 12 ) in general requires a large N and finite universe fourth moment. 

5. Further possibilities. Confidence interval ( 2 ) is derived from but one 
of a series of general intervals, each of which depends only on a. It may be pos¬ 
sible to derive from this series even more efficient intervals, according to the 
method now to be outlined. 

One way of arriving at ( 2 ) is to consider all products of the form (®< — n) 
(Xj — m), where i > j and i, j = 1, 2, • • • , N. Let p 2 be the mean of these 
N(N — l )/2 products. It can easily be seen that p 2 = u in (4) with c = 0 , 
so that p 2 is a second degree polynomial in m — p, the coefficients being sample 
statistics. A more general quadratic would be u 2 = p 2 + Cipi + c 0 , where C\ 
and Co are arbitrary constants and p\ is the mean of the N values (Xi — p) or 
pi = m — ju. It is easily seen that Epi = Ep 2 = Epip 2 = 0, and that the only 
universe parameter involved in Epi and Epi is <r 2 . Hence the only universe pa¬ 
rameter upon which u 2 depends is also <r 2 . 

Higher degree polynomials in m — p can be defined, possessing the same 
properties as wi . Let p 3 be the mean of the N(N — 1 )(N — 2)/3! products of 
the form (x< — p)(xj — p)(xk — p), where i > j > k and i, j, k = 1, 2 , • • • , N\ 

etc.; and let p N = (xi — p)(x 2 — p) - (x N — p). Set p 0 = 1, and let 

« 

n 

(13) U n = CanPa (n = 1, 2, • • • , -A0> 

a—0 

where the c on are arbitrary constants. It is easily seen that Ep a = 0 (a > . 0 ), 
Ep a pb = 0 (a 6 ), and that each Epi depends on only the parameter o as far 

1 See, for example, Louis Guttman, “An inequality for kurtosis,” AnnaU of Math . 
Stat., Vol. 19 (1948), pp. 277-278. 



CONFIDENCE INTERVAL FOR MEAN 


413 


as the universe is concerned. Hence Eu 2 n depends only on a 2 . Furthermore, by 
writing Xi — paa (x< — m) + (m — m)> it is seen that p a is a polynomial of degree 
a in m — p, the coefficients being sample statistics. From (13), then, u n is a 
polynomial of degree n in m - p with statistics as coefficients. 

According to Tchebychefs inequality, 

(14) Prob {u 2 n £ \ 2 Eul\ > 1 - X" 2 . 

The interval for u\ in the braces can be expressed in two statements: 

(15) fn(m - p ) = Un - ^ o, 

(16) g n (m - p) = u n + X V Eu n ^ 0. 

Both/* and g n are polyno mials of degree n in m — p, g n exceeding f n always by 
the additive constant 2 \\/Eu\. Let q n and Q n be the smallest and largest real 
zeros respectively of f n , and let r n and R n be the smallest and largest real zeros 
respectively of g n . 

For convenience, we can suppose that c nn —-the coefficient of (m — /z) n in u n — 
is positive. If n is even, then/ n is positive for m — /x > Q n and for m — p < q n . 
Hence the interval q n ^ m — n ^ Q n contains all the points included in (15) 
and possibly more. Since the probability of (15) is not less than the probability 
of (14), we can write the following confidence interval: 

(17) Prob (? n ^ m - ju ^ Q n } > 1 — X -2 (n even). 

The problem remains to determine the c on so as to minimize the expected value 
of Q n — q n • Inequality (9) provides the minimum for the case n — 2 . This 
can be verified by adding the term C\p\ to u in (4) and finding that the minimum 
requires Ci = 0 . 

If n is odd, we again may set c nn > 0. Then /» > 0 for m — /z > Q n , and 
g n < 0 for m — g < r n . The interval r n S rn. — n ^ Q n thus contains at least 
all the points found jointly in (15) and (16) and hence forms a conservative con¬ 
fidence interval: 

(18) Prob |r n ^ m - n g Q«) > 1 - X ~ 2 (n odd). 

Again, the problem is to determine the c an so as to minimize the expected value 
of Q n — r n . Tchebychefs inequality (3) does this for the case n = 1 . 

Although the only population parameter involved throughout is <r 2 , the samph 
moments up to the nth order are present in (15) and (16). It thus seems plau¬ 
sible that improvement over inequality (9) should be possible for n > 2. To 
obtain such an improvement requires developing a distribution-free theory of 
the zeros of /* and g n beyond the quadratic case. 



414 


E. CANSADO MACEDA 


ON THE COMPOUND AND GENERALIZED POISSON DISTRIBUTIONS 

By E. Cansado Maceda 
University of Madrid 

1. Summary. In this note we deduce several properties of the compound 
and generalized Poisson distributions; in particular their closure and divisibility 
properties. An infinite class of functions whose members are both compound 
and generalized Poisson distributions is exhibited, and several of the distributions 
of Neyman, Polya, etc. are identified. The present note stems from a paper by 
Feller [2]. 

2. The compound Poisson distribution. If F(x | a) is a family of distribu¬ 
tion functions depending on the parameter a, and U(a) is a distribution function 
such that it assigns zero probability to any a domain for which F(x\a) is unde¬ 
fined, then 

G(x) = [* F(x\ a) dU(a) 

is a distribution function. In particular if F(x\a) is the Poisson distribution 
with mean a, and 1/(0) = 0, G{x) is called the compound Poisson distribution 
associated with the distribution function U(a ); cf. Feller [2]. Clearly G(x) is a 
step function over the non-negative integers, the saltus at the point x = n being 

r, = r« a ~,dU(a), » = 0,1, 2, • • • . 

Jo n\ 

It is convenient to introduce the factorial moment generating function 
(f.m.g.f.) for G{x) as follows 

«(*) = *((1 + *)*) = £ ir.d + z) n 

TlmmO 

= [ e^’dUia) 

Jo 

= </>(*) 

where <£(z) is the ordinary moment generating function (m.g.f.) for U(a). This 
gives a convenient relationship between the moments of U (a) and its associated 
compound Poisson distribution. 

On account of the multiplicative properties of c o(z) and under the convolu¬ 
tion of G{x) and U(a) respectively, it is seen that the compound Poisson dis¬ 
tributions form a closed family, and if Gi(x) and G 2 (x) are two compound Poisson 
distributions associated with Ui(a) and t/ 2 (a) respectively then G\{x)^Gi(x) is 
associated with t/i(a)* 1 / 2 ( 0 ). In addition, if U(a) is infinitely divisible (cf. 
Cramer [I]) then G(x) is also, since it can be factored into the convolution of 
arbitrarily many compound Poisson distributions. 



POISSON DISTRIBUTIONS 


415 


Choosing in particular U(a) as the Pearson type III distribution, the asso¬ 
ciated function is the Polya-Eggenberger distribution, and if U{a) is a Poisson 
distribution the associated function is the Neyman contagious distribution of 
Type A. 

3. The generalized Poisson distribution. If F(x | a), defined for non-nega¬ 
tive integers a = 0, 1 , 2 , • • • , is the a-fold convolution of a given distribution 
F(x) with itself, i.e. F{x\a) = F(x)* a , and U(a) is the Poisson distribution with 
parameter a , then the distribution function 

G(x) = f F(x | a) dU(a) 

Jo 

is called the generalized Poisson distribution associated with F(x). 

If ft(z) is the f.m.g.f. of U(a) then for the f.m.g.f. of G(x) we have 

«(*) = t m)) n <r a 

n~.o n! 

_ c «(Q(*) -1) 
oo 

It follows that c c(z) can be written as IJ u v (z) where a>„(z) is a generalized Pois- 

son distribution, and thus «(z) belongs to the infinitely divisible family. More¬ 
over, if Gi(x) and G*(x) are two generalized Poisson distributions associated with 
Ui(a) and I/ 2 (a) with parameters «i and a 2 respectively, then G(x) = Cn(x)*(jr 2 (x) 
has for f.m.g.f. 

«!««.(.) = cxpjca, + « 2 ) 1 )}, 

and G(x) is again a generalized Poisson distribution function associated with 
the distribution 


U(a) = ai ** 2 

a\ + a 2 

and with the parameter a\ + a 2 • Thus the generalized Poisson distributions 
form a closed family. The analytic nature of the generalized Poisson distribu¬ 
tions have been studied by Hartman and Wintner [3]. As noted by Feller [2] 
the various Neyman contagious distributions are generalized Poisson distribu¬ 
tions. 

4. Further remarks. From the above observations it is clear that a necessary 
and sufficient condition for a distribution to be a compound Poisson distribution 
is that its f.m.g.f. be of the form 

( 1 ) 


Wl(z) = <t>(z) 



416 


GOTTFRIED E. NOETHER 


where <t>(z) is the ordinary m.g.f. of a non-negative random variable. Likewise a 
necessary and sufficient condition for < o(z) to be the f.m.g.f. of a generalized 
Poisson distribution is that it be of the form 

(2) < 02 (z) = e a(nU) “ l) , a > 0, 

where Q(z) is the f.m.g.f. of an arbitrary distribution function F(: r). If we 
choose 0(z) = and Q(z) = e cz , then coi(z) = <o 2 (z), and the distribution 

whose f.m.g.f. is <oi(z) (the Neyman contagious distribution of Type A) is simul¬ 
taneously a compound and a generalized Poisson distribution (cf. Feller [2]). 
We now show that there is an infinite class of distributions with this property. 

First note that if <f>(z) is the m.g.f. of an arbitrary distribution, then exp 
{a(<£(z) — 1) ( is also the m.g.f. of a d.f., and in fact is the m.g.f. of the generalized 
Poisson distribution associated with the distribution whose m.g.f. is<£(z). Now 
let <t>(z) be the m.g.f. of an arbitrary non-negative random variable, and define 

(3) a>(z) = exp{a(0(z) — 1)} a > 0. 

Then oj (z) is simultaneously of the forms (1) and (2), since <f>(z) is, by (1), also 
the f.m.g.f. of a distribution function, i.e. the compound Poisson distribution 
associated with the distribution whose m.g.f. is <t>(z). However, not every dis¬ 
tribution which is both a compound and a generalized Poisson distribution can 
be generated in this manner. For example, the Polya-Eggenberger distribution 
is easily shown to be both a generalized and a compound Poisson distribution, 
yet its f.m.g.f. 

a>(z) = (1 - dzT h '\ d>0,h>Q, 

h 

manifestly is not of the form (3), since this would imply (f>(iz ) = 1 — ~ log 

(1 — diz) is a characteristic function. But | <t>(iz) | is unbounded as z —> ± <» and 
thus is not the characteristic function of a distribution. 

REFERENCES 

[1] H. Cramer, “Problems in probability theory,” Annals of Math. Stat. f Vol. 18 (1947), 

pp. 165-193. 

[2] W. Feller, “On a general class of contagious distributions,” Annals of Math. Slat., 

Vol. 14 (1943), pp. 389-400. 

[3] P. Hartman and A. Wintner, “On the infinitesimal generators of integral convolu¬ 

tions,” Am Jour, of Math., Vol. 64 (1942), pp. 272-279. 


ON CONFIDENCE LIMITS FOR QUANTILES 

By Gottfried E. Noether 
Columbia University 

In finding confidence limits for quantiles it is usual to determine two order 
statistics Z t and Zj which with a given probability contain the unknown quantile 



CONFIDENCE LIMITS 


417 


between them. The values of i and j corresponding to a given confidence coeffi¬ 
cient can be determined with the help of the distribution laws of order statistics 
as is shown, e.g., in Wilks [1]. The purpose of this note is to determine i and j 
with the help of a confidence band for the unknown cumulative distribution 
function. 

In what follows we shall always denote the cumulative distribution function 
(cdf) by F(x), i.e., F(x) = P{X < x\. Then the quantile q p is determined by 

(1) F(q p - 0) < V < F(q P ) 
which reduces to 

(10 F(q p ) = p 

if F(x) is continuous. Given a sample of size n we can construct the sample 
cdf F n (x) defined by F n (x) = 1/n (number of observations < x). Confidence 
coefficients will always be denoted by 1 — a. 

Assume that we can construct two step functions L(x) and U(x) parallel to 
F n (x) such that for any fixed value x 

(2) P{L(x) < F(x) < U(x)} = 1 - «. 

We do not require that the confidence band determined by L(x) and U(x) cover 
the graph of the unknown cdf F(x) with probability l — a, but only that for any 
arbitrarily chosen value x (2) is true. 

Let 

L(x) = yic, Uix) = Q k 

for Zk < x < Zjt+i, k = 0, 1, • • • , n w T here zu is the value taken by the order 
statistic Zu and z 0 = — oo, z n+i = + 00 . Then if F(x) is continuous it follows 
from (2) that a confidence interval with confidence coefficient 1 — a for q p is 
given by 

(3) Zi <q P <Z ; 

where i and j are determined by 

(4) 0,-i < p, 0» > V 

(5) Vi -1 < P , Vi > V- 

It will be noted that (3) represents a half-open interval. However as long as 
we only admit continuous cdf J s the confidence coefficient is not changed if we use 

(3') Zi <q P < Zi 

or 

(3") Zi <q P < Zi 

instead. This is no longer true if we also admit discontinuous cdfs. Then the 
confidence coefficient connected with (3 ; ) is < 1 — a, while that connected with 



418 


GOTTFRIED E. NOETHER 


(3") is > 1 — a, as follows immediately from consideration of the possible out¬ 
comes when (1) is true. This is the same result as that obtained by Schefte 
and Tukey [2]. 

We shall now indicate how rjk and 6k can be obtained and find their values in a 
particular case. For any arbitrary value x we can consider F n (x) as the sample 
estimate of the unknown parameter p = F(x) of a binomial distribution. Chop¬ 
per and Pearson [3] have discussed how confidence intervals for the unknown 
parameter of a binomial variate can be found. Thus we can determine rj k 
and 6k correspondingly, but as is well known (2) cannot be achieved with prob¬ 
ability exactly equal to 1 — a. We shall have to be satisfied with probability 
>l—o. Consequently the same will hold true for the confidence coefficient 
connected with the confidence interval for q v . 

In many cases central confidence intervals seem to be more desirable, at least 
intuitively, than others. Our method produces such central confidence intervals 
for the unknown quantile if we use central confidence intervals in the construc¬ 
tion of the confidence band. In that case rj k and 6 k are determined by 

(6) § - h t (k, n-k + 1) 

k = 0,1, • • •, n 

(7) | = Wn - k, k + 1), 
except that ij 0 = 0, 0„ = 1 by definition, where 

h( P , q ) = f f‘a - ty- 1 dt/ f r\ 1 - (T l dt 

Jo Jo 

is the incomplete beta function. Scheff6 [4] has pointed out how the tables of 
percentage points of the incomplete beta function by C. M. Thompson, etc. 
[5] can be used to find rj k and 6 k . 

We shall show now that in the case of the median M the solution based on 
(3)-(7) leads to the same confidence interval as that suggested originally by W. 
R. Thompson [6]. Thompson found that for k < n + £ 

(8) P{Z k < M < Zn- k+l } = 1 - 2/j(n - k + 1, k) 

provided the unknown distribution had a continuous cdf. (8) can be used to 
maximize k under the condition that the righthand side is > 1 — a. 

We shall first show that our method leads to the same kind of a confidence 
interval, i.e., one with i = l f j = n — l+ 1. This follows immediately from the 
fact that by (6) and (7) 

(9) 1 — Oi = 

For let 

(10) 0 t _i < \ and 6i > £, 
then by (9) ijn-z < £ and 7? n -z+i > £. 



A LOWER BOUND 


419 


It remains to be shown that k as determined by (8) equals l. This will be so 
if we can show that 

(ID h(,n - l + 1, l) < I { (n -1,1+ 1). 

Remembering that I x (p, 9 ) is a monotonically increasing function of x we get 
with the help of (7) and (10) 

\ = /!-»,_> - l + 1, l) > I\(n -1+1,1) 

and 

~ = /i_ 9 ,(n -1,1+1) < Ii(n -1,1+1) 

which proves (11). 

In conclusion it may be worth while pointing out that the formula 
P{Zi < q p < Zj } = I p (i, n — i + 1) — I p (j, n — j + 1) 

given, e.g, in Wilks [1] for the continuous case can be obtained by a slight modi¬ 
fication of (6). 


REFERENCES 

[1] S. S. Wilks, “Order statistics,” Am. Math. Soc. Bull., Vol. 54 (1948), pp. 6-50. 

[21 H. Sciieff£ and J. W. Tukky, “Xon-paramctric estimation. I. Validation of order 
statistics,” Annals of Math. Stat ., Vol. 16 (1945), pp. 187-192. 

[3] C. J. Clopper and E. S. Pearson, “The use of confidence or fiducial limits illustrated 

in the case of the binomial,” Biometrika , Vol. 26 (1934), pp. 404-413. 

[4] H. Scheff£, “Note on the use of the tables of percentage points of the incomplete beta 

function to calculate small sample confidence intervals for binomial p,” Bio met- 
rika, Vol. 33 (1944), p. 181. 

[5] C. M. Thompson, E. S. Pearson, L. J. Comrie, and H. O. Hartley, “Tables of per¬ 

centage points of the incomplete beta function,” Biometrika, Vol. 32 (1941), pp. 
151-181. 

[6] W. R. Thompson, “On confidence ranges for the median and other expectation distri¬ 

butions for populations of unknown distribution form,” Annals of Math. Stat.. 
Vol. 7 (1936), pp. 122-128. 


A LOWER BOUND FOR THE EXPECTED TRAVEL AMONG m RANDOM 

POINTS 

By Eli S. Marks 
Bureau of the Census 

In connection with cost determinations in sampling problems, it is frequently 
necessary to determine the amount of travel among m random sample points in 
an area. A lower bound for the expected value of this distance is found to be: 

[Am — 1 



420 


ELI S. MARKS 


where A is the measure of the area from which the m random points are 
drawn. 1 

If in a finite area S we locate m points at random (see Figure 1), we can trace 
a continuous path among the m points by starting at some point and connecting 
the points by line segments. The points can be connected in any order so that 
the path touches each point only once (unless it intersects itself at one of the 
random points). We are interested in a lower bound for the expected value of 
the length of the shortest of the m! possible paths. 



Fig. 1. m Random Points in S. 

We have above an area S in which m random points have been selected (with 
m — 14). 

The shortest path among the m points consists of m — 1 “links” (line segments) 
between two points. Each link can be assigned to one of its end points, leaving 
some pre-designated point (e.g., the m-th point selected) with no link assigned. 
The link assigned to the i-th random point (x (t) ) must be no less than r (t) the 
distance from x (t ) to the nearest of the other (m — 1) points. If we denote the 
length of the shortest path by L : 

L > Z r w , 

1-1 


E{L) > £ EM 


Let 2£,(r<o) be the expected value of r (t) conditional upon falling at the 
point x in S and let F(r | x) be the conditional distribution function of r (l) for X{ %) = 
x. Thus F{r | x) is the conditional probability of r (t) < r or the probability of 

1 The lower bound obtained is similar in form to the expression for distance traveled 
among a set of random points used by Mahalanobis [2] and Jessen [1 ]. 



A LOWER BOUND 


421 


one or more of the (m — 1) random points other than x (t) falling inside a circle, 
C r , with radius r and center at x (see Fig. 1). Then, we have: 


E. - £ ’dF( r|r), 

F{rl .). t 

l M(S) 


where M(S) and M(SC r ) are the measures of £ and SC r 
probability of a random point in S falling into (\ . 


so 


, M(SC r ) . 
that _) /oV - isthe 
M(S) 


Let A = M ( S ) and construct a circle C \vith center at x and radius p = 

Then M(C) — A = M(S). Let d be the distance from x to the nearest of 
(m — 1) points selected at random from C and let G(r) be the distribution func¬ 
tion of d. Then we have: 



m = f” rdC(r), 

(M(C) - M(CC r ) 


1 < 
G(r) 


1 - 


V 


M(C) 


For r < p, 


Af(C’rC’) = M(C.) > M(SCr). 


For r > p, 

M(C r C) = M(C) = M(S) > M(SC r ). 
Thus, since M(C r C) > M(SC,), we have for all .r in S: 

0(r) > F(r | x), ? 


and thus, 


E(d) < E x (r, 

Since E(d) < E x (r M ) for all x in S: 


E(d) < E(r ( „), 

(m - l)E(d) < E E(r M ) < E(L). 

»-l 

It only remains to evaluate E(d), the expected distance from the center of a 
circle to the nearest of (m — 1) random points. This can be done very easily 
by substituting in the expression for G(r): 

A = Af(O f 

tit 2 = M(C r C), when r < p = 



422 


H. M. BACON 


to give: 



E(d) = jf rG'(r) dr = ^ [B(m, £)], 


where £(m, £) is the complete Beta function. 
Since y/m [. B(m , £)] > Vir: 


^ 



Thus, we have: 


It is obvious that the development is general and applies to m random points 
in any bounded two-dimensional Borel set. However, the lower bound ob¬ 
tained will, in general, be useful only when S is a connected region. 

REFERENCES 

[1] Raymond J. Jessen, “Statistical investigation of a sample survey for obtaining farm 

facts,” Iowa State College Research Bulletin 304 (1942). 

[2] P. C. Mahalanobis, “A sample survey of the acreage under jute in Bengal,” Sankhya f 

Vol.4 (1940), pp. 511-530. 


A MATRIX ARISING IN CORRELATION THEORY 1 
By H. M. Bacon 
Stanford University 

1. Introduction. In the study of time series, it is frequently desirable to 
consider correlations between observations made in different years. Let xa , 
xa , • • • , Ximbem values of the variable , expressed as deviations from their 
arithmetic mean, where is a variable observed in the ith year (i « 1,2, • •: , n). 

1 A linear correlogram is considered by Cochran in his paper, *‘Relative accuracy of sys¬ 
tematic and stratified random samples for a certain class of populations,’ 1 ( Annals of Math . 

Stat. t Vol. 17 (1946), pp. 164-177) in which p„ » 1 — £. Setting p — | * — j | and L — 1/p, we 

L 

have the case considered above. 



A MATRIX 


423 


Let <r, be the standard deviation of x x . If we denote by r XJ = r JX the correla¬ 
tion of x % with x } , and if we assume the x t to be normally distributed, then 


_ 1 _ 

(2ir) n/2 (7i <T2 • 


Z7fl exp 


/_ 1 v V' x * 

\ R* x cr, / 


is the frequency function giving the distribution. Here R is the determinant 
| r XJ | of the correlation coefficients, and R tJ is the cofactor of the element r XJ 
in this determinant. 

We may make various assumptions regarding the behavior of the correlation 
coefficients over the n years. One such assumption of some interest is that the 
correlation coefficients diminish in such a way that 


r„ = r„ = 1 - | i - j |p 

where p is a fixed positive number not greater than 2/(n — 1). Under these 
circumstances, we can evaluate R and R tJ in terms of n and p. 


2. Evaluation of R. We may let R(p) represent the determinant R of order 
n whose element in the ith row and jth column is r t) — r JX = r n -m- 3 = 
= 1 — \i — j \ p where, for the purpose of evaluation, p is any real 
number. Since each two-rowed minor of R(p) is divisible by p, R(p) is divisible 
by p n ~ l . Furthermore, since R(p) is a polynomial in p of degree at most n, we 
have 

R(p) - Ap n + Bp*" 1 = p n ~\Ap + B). 

If we set p = 1 and p = —1, we find A + B — R(l) and R(— 1) = (—l) n_1 
(—A + B) so that — A + B = ( — l) n- \R(—l). By elementary methods we 
find that R( 1) = 2 n_2 (3 - n) and R(- 1) = (-l) n_1 2 n " 2 (n + 1). Hence 

A + B = 2 n ~ 2 (3 - n) 


and 


-A + B = 2 n ” 2 (n + 1). 

Solving for A and B we find that 

R « R ( p ) . 2 n " 2 p n ” 1 [2 - (n - 1 )p]. 

3. Evaluation of R XJ . Similar methods yield the following values for the 
cofactors R XJ of the elements of R: 

Ra = Rnn = 2 n -y~* [2 - (n - 2)p], 

R a = = ... = = 2 n-2 p n_l [2 - (n - l)p], 

Rln = Rnl - 2" - ®p n-1 , 

= — 2’ ,-8 p n-s [2 - (n - l)p], 

otherwise, 

=« 0 . 



424 


W. J. DIXON 


4. The frequency function. The quadratic form appearing in the exponent 
in the expression for the frequency function can now be written as 

y y Ru Xj Xj _ 2 - (n — 2)p (x[ , x 2 n \ 

£i H Ra <fj 2p[2 - (n - l)p] \of ^ <r 2 / 


+ "(t + T + • ' 
pVi *» 

+_ 1 _ 

2(2 -(» - l)p] 

_ 1 fxiXj X2X1 

2p\o\ o 2 2 cri 


• +^=- 1 ) 

Al** In sA 

\(Ti <r n <r n <n/ 


CT2 <73 


X 3 X 2 
03 o 2 


+ 


<r« 0“n-i/ 

_ 1 j - 2 — (n — 2)p/x\ , a£\ . y 1 a;? _ y 1 XiXi+i~\ 
= P L2[2-(n - l)p]\crf o\) f Sa? *<r,+J 

1 _ 1 / Xj x w \ 

2 — (n l)p \<7j On)' 


5. Maximum likelihood. The expression z is the likelihood of getting a 
particular set of values of the variables £1 , Xs ,*'*>£» • It is often important 
to regard the r,y and the cr, as parameters and to determine them so that the like¬ 
lihood will be a maximum. If we assume <ri = cr 2 = • • • = cr n = <r, then 


1 


z = 


(2r )” ,2 <rVS eXp 


/_ l±± 
\ 2 h.h 


Rij 3/j Xj 


The question, in our case, now becomes, What values of p and o will make z 

a maximum for given Xi ? Necessary conditions are that ^ = 0 and ~ = 0. 

dp do 

Since Rij and R are given in terms of p, the process of differentiation can be carried 
out (first take the logarithm of z), and values of p and <r necessary for a maximum 
determined. It is, of course, possible that z has no maximum, and the sufficiency 
of these values must be tested. The computations for the general case are 
laborious, though straightforward. Furthermore, because of the complicated 
nature of the coefficients in the equation to be solved for p, the general solution 
is not readily obtainable. This equation is, however, of third degree, and it can 
be solved in any particular case. 


TABLE OF NORMAL PROBABILITIES FOR INTERVALS OF VARIOUS 
LENGTHS AND LOCATIONS 

By W. J. Dixon 

University of Oregon 

1. Introduction. The probability associated with a particular finite range of 
values is often desired. The usual tables of normal areas gives values for f or 



TABLE OF NORMAL PROBABILITIES 


425 


as in the table by Salvosa 



The WPA table [2] gives 


£• 


The author 


px+il 

has deposited with Brown University a table of / for values of ;r[0(.l) 5.0] 


and values of Z[0(.l) 10.01. The values in the table may be interpreted as the 
probability that an observation from a normal population with unit variance 
will fall in an interval of length l whose midpoint is a distance x from the mean. 
These values can be obtained by a simple computation from the existing tables. 
Since values were being used frequently, the present table was constructed. 
Microfilm or photostat copies may be obtained upon request to the Brown 
University Library. 


2. Computation. The values were obtained by finding the difference between 

r x —l l pz+ll 

the integrals J and J as given to six decimal places in Salvosa’s table. 

Being differences, the values are subject to an error of 1 unit in the sixth place. 
For values of x + \l greater than 5, the values can be obtained by computing 

1 


_ r" 

j— 00 


( 1 ) 


The search for errors was aided by computing column sums; i.e. 
5° /•*»+!* 1 /»i l 

£ / +;[ =.5n, 

Jx x —\i 2 J—[i 


where i represents the row number and n represents the column number. For 
example, n — 17 corresponds to column for l = 1.7. The approximation becomes 
poorer as n increases but the sums were still useful for checking purposes. 


3. Example. The table has been used in studies of the expected proportion 
of a line covered by intervals dropped on it according to some normal probability 
function. Let P n (x) be the probability that the point x is covered at least once 
when n intervals are dropped on the ar-axis. H. E. Robbins [3] gives the ex¬ 
pression : 

(2) E(F) = i jf PM dx, 

for the expected proportion of a line of length L covered at least once by these 
intervals. 

Let f(x) dx be the probability that an interval falls with its center in dx and l 
be the length of the interval. The probability that a point x will be covered by 
one interval dropped on the ar-axis is: 

r x+\l 

(3) g(.t) = / /(<) dt. 

Jx—\l 

When n intervals are dropped, the probability that x is covered at least once is 1 

(4) Pn (*) - 1 - (1 — $(*))", 



426 


G. E. ALBERT 


and 

(5) E(F) = 1 - i fjl - g(x)) n dx. 

When k groups of n» intervals are dropped according to, say normal distributions 
with different means, 

(6) p„<*)-i-na-*.(*))■'. 

*-l 

Where 

(7) 9i{x) = / /,(«) dt 

Jx-{1 

and we obtain 

(8) E(F) = 1 - J T n (1 - g<(x)) ni dx. 

Jj JQ t~l 

The values g{x) are those given in the table and are useful in evaluating the 
integrals in (5) and (8) by numerical methods. 

REFERENCES 

II] Luis R. Salvosa, “Tables of Pearson’s Type III functions,” Annals of Math. Stat Vol. 
1 (1930), p. 191. 

[2] National Bureau of Standards, Tables of Probability Functions , Vol. 2 (1942). 

(3] H. E. Robbins, “On the measure of a random set,” Annals of Math. Stat., Vol. 15, (1944). 


CORRECTION TO “A NOTE ON THE FUNDAMENTAL IDENTITY OF 
SEQUENTIAL ANALYSIS” 

By G. E. Albert 
University of Tennesse 

In the paper cited in the title (Annals of Math. Stat., Vol. 18 (1947), pp. 593- 
596), the proof of Lemma 3 is incorrect. The following correct proof is due to 
Mr. C. R. Blyth of the Institute of Statistics, University of North Carolina. 
It is easy to establish the equation 

P(n = N\F)[ P m~'' = P(n = N\G)E n ^[exp(-kZ K )\G], 

where En~y(u\G) denotes the conditional expectation of u under the condition 
that n = N for any fixed integer N. By Wald [2], equations (2.4) and (2.6), 
there exists a finite constant C independent of N which dominates the expected 
values En-jv[exp(— UZ N )\G] for every N. Thus 

(A). P(n = N\F)[<p(to)]' N g C-P(n = N\G). 



CHARL1ER TYPE B SERIES 


427 


By Stein’s theorem [3], there is a positive number t\ such that E(ex p nh\G) is 
finite. But by (A), 

E{expn[k — log <?(£<>)]} ^ C-E(exp nti\G), 
and Lemma 3 is proved. 


CORRECTION TO “ON THE CHARLIER TYPE B SERIES” 

By S. Ivullback 
George Washington University 

In the paper cited in the title (Annals of Math. Stat., Vol. 18 (1947), p. 575), 
the phrase “so that . . . Z?i > 1” on lines 5 and 6 should be deleted. I am grate¬ 
ful to Prof. Ralph P. Boas, Jr. for calling this to my attention. 



ABSTRACTS OF PAPERS 

Presented June 22-24, 1948 at the Berkeley Meeting of the Institute 

1. Estimation of Parameters for Truncated Multinormal Distributions. Z. W. 

Birnbaum, E. Paulson and F. 0. Andrews, University of Washington. 

Let X ( N > « (Xi , • • • X p , , • • • , X N ) be an JV-dimensional random variable with a 

non-singular normal distribution, and let the expectations, variances and covariances of 
X p +1 , • • • X N be known A large sample of X( N ) is available, obtained under some side- 
condition on (X p+ i , • • • , X N ); this side-condition may be a truncation of any kind or, more 
generally, a selection; i.c. imposing on X p +i , • • • , X N a probability-distribution different 
from the original marginal distribution. A method is developed for estimating, from such a 
large sample with a side condition, all the missing parameters of the original distribution of 
X( V ) , that is the expectations, variances and covariances of X\ , ••• , X p , and the co¬ 
variances <rXjX k for j = 1, • • • , pand k = p -f-1, • • • , N. This method does not require the 
knowledge of the side-condition. (This paper was prepared under the-sponsorship of the 
Office of Naval ltesearch.) 

2. A Test of the Hypothesis that a Sample of Three Came from the Same Normal 
Distribution. Carl A. Bennett, General Electric Company, Hanford 
Works, Richland, Washington. 

In the control of the precision of chemical analyses performed in duplicate, a test some¬ 
times becomes necessary as to whether three determinations can reasonably be assumed to 
have arisen from the same normal population. A critical region for testing this hypothesis 
is given by R > Ro , where R = D/d, D being the maximum and d the minimum difference 
between the three values, and Itn is determined by integration over the upper tail of the 
Cauchy distribution. It can easily be seen that this test is equivalent to a Most between a 
sample of one and a sample of two. 

3. A Note on the Application of the Abbreviated Doolittle Solution to Non- 
Orthogonal Analysis of Variance and Covariance. Carl A. Bennett, 
General Electric Company, Hanford Works, Richland, Washington. 

S. S. Wilks has shown that the sums of squares necessary to the tests commonly made in 
non-orthogonal analyses of variance or covariance can in general be reduced to the ratio of 
two determinants. If several determinantal operations are performed to remove the 
singular principal minors, the abbreviated Doolittle solution yields these sums of squares 
directly. A combination of this technique and the calculational methods advanced by 
Wald and Yates greatly reduces the tedium of calculation in this type of analysis. 

4. Yield Trials with Backcrossed Derived Lines of Wheat. G. A. Baker and 
F. N. Briggs, University of California, Davis. 

Strains of White Federation 38 and Baart 38 Wheats derived by backcrossing sufficient 
to insure a high degree of homogeneity for all genetic factors were grown in conventional 
yield trials. The results were somewhat contradictory and led to a critical examination of 
such trials. The assumption that the deviations of yields in field trials from the specified 
pattern are random with uniform variance and expectation zero is not sufficiently realistic. 
Wc are led to consider a mathematical model which assumes a set of fertility levels upon 

428 



ABSTRACTS OF PAPERS 


429 


which a random element is superimposed. On the basis of this model it is possible to 
account for the low observed correlations between residuals and plot yields. In such a 
model the variance ratio F may be approximately unbiased but then its variance is smaller 
than under conventional assumptions. On the other hand, the expected value of F may be 
greater than one and sufficiently large so that “ significant differences” between strains will 
always be found due to the differences in fertility levels. In such cases the results of the 
experiment may be misinterpreted. Transformations, in the ordinary sense of the word, 
will not bring such data into conformity with the conventional model. In order to bring the 
correlation between residuals and plot yields down to a sufficiently low level it is necessary 
to concentrate most of the variation in fertility levels into a few plots. That this is not 
unreasonable is borne out by agronomic observations. This model also explains the 
absence of correlation between the yields of strains as determined in two different trials on 
the same set of strains. 


5. The Selection of the Largest of a Number of Means. Charles M. Stein, 
University of California, Berkeley. 


Suppose X t , , i = 1, • • • , p; j = 1, 2, • • • are independently normally distributed with 
means ft -f y } and variances <r) where ft , v, are unknown but <r* are known. t y a are fixed 
numbers withO < e,0 < a < 1. It is desired to select, by a sequential procedure, in which 
we take first the observations with second subscript 1, etc an integer M among 1, • • • , p 
such that, for every k = 1, • • ■ , p and & , • • • ft , in , r\i , • ■ • satisfying ft ft + e for all 
j ^ k, P\M = k) ^ 1 — a In accordance w'ith the following rule, one decides at each stage 
(after the observations with second subscript n) to take no more observations with certain 
first subscripts. For each n = 1,2, • • ■ and each l =!,••• , p compute 





where X, is the average of the observations with second subscript j and t, is the number of 
such observations. Continue taking observations A'z,„ n i ••• for those l for w T hich this 
expression is greater than ( lna)/c but not for the others. Eventually there will be at most 
one subscript l ■* 1, • • • , p for which one continues to take observations and if there is one 
this is chosen to be M. If there is none, the l for which thesumislargest is chosen to be M. 
This procedure is a straight-forw ard application of the Lemma on p. 146 of Wald’s Sequential 
Analysis , and generalizations can easily be found. 


G. The Effect of Inbreeding on Height at Withers in a Herd of Jersey Cattle. 

W. C\ Rollins, S. W. Mead, and W. M. Regan, University of California, 
Davis. 


The data consist of measurements of height at withers of about 200 females for various 
ages from one month to five years. The intensity of inbreeding as measured by Wright’s 
coefficient of inbreeding averaged 141 per cent and reached as high as 44 per cent in a fcw r 
cases. 

An intra-sire covariance analysis of height and per cent of inbreeding was made for 
various ages from the first month to the fifty-fourth month. 

The results of the statistical analysis indicate that the inbred animals are shorter at one 
month of age and grow more slowdy up to about the sixth month than do the outcrossed 
animals, but that from the sixth month on the inbreds begin to catch up with the outcrossed 
so that at maturity there is no significant difference in height. 



430 


ABSTRACTS OF PAPERS 


7. An Example of a Singular Continuous Distribution. Henry Scheff£, 
University of California at Los Angeles. 

Simple and “natural” examples of singular continuous probability distributions are of 
pedagogical interest. They are trivially available in the fc-variate case for k > 1. A 
univariate example may be obtained from the notion of a sequence of independent trials of 
an event with constant probability p of success, a notion familiar to the student and indis¬ 
pensable in elementary probability theory. The (real-valued) random variable X is taken 
to be the dyadic representation of the sequence of results (1 and 0, respectively, for success 
and failure). It is known that X has a singular continuous distribution for p ^ 0, }, 1. 
This result may be proved by using only the Tchebycheff inequality together with the 
formulas for the mean and variance of the binomial distribution! 

8. On the Theoiy of Some Non-Parametric Hypotheses. Erich L. Lehmann 
and Charles Stein, University of California, Berkeley, California. 

For two types of non-parametric hypotheses optimum tests are derived against certain 
classes of alternatives. The two kinds of hypotheses are related and may be illustrated by 
the following example: (1) The joint distribution of the variables Xi , • • • , X m , Fi , • • • , 
Y n is invariant under all permutations of the variables; (2) the variables are independently 
and identically distributed. It is shown that the theory of optimum tests for hypotheses 
of the first kind is the same as that of optimum similar tests for hypotheses of the second 
kind. Most powerful tests are obtained against arbitrary simple alternatives, and in a 
number of important cases most stringent tests are derived against certain composite 
alternatives. For the example (1), if the distributions are restricted to probability densi¬ 
ties, Pitman’s test based on $ — £ is most powerful against the alternatives that the X’s and 
Y 's are independently normally distributed with common variance, and that E(Xi ) =» £, 
E(Y%) — i\ where 17 > £. If 1 7 — £ may be positive or negative the test based on | # — £ | 
is most stringent. The definitions are sufficiently general that the theory applies to both 
continuous and discrete problems, and that tied observations present no difficulties. It is 
shown that continuous and discrete problems may be combined. Pitman’s test for example, 
when applied to certain discrete problems, coincides with Fisher’s exact test, and when 
m ■* »the test based on | # — £ | is most stringent for hypothesis (1) against a broad class of 
alternatives which includes both discrete and absolutely continuous distributions. 

9. Concerning Compound Randomizationinthe Binary System. John E. Walsh, 
Project Rand, Santa Monica, California. 

Consider a set of binary digits. The numerical deviation from 1 of the conditional 
probability that a specified digit equals 0 is called the bias of that digit for the given condi¬ 
tions on the remaining digits of the set. The maximum bias of the set is defined to be the 
maximum of the biases of the digits of the set. A set of binary digits is called random if its 
maximum bias is zero. Now consider an array of (1 + h) • • • (1 4 * t&) X » binary digits 
such that the rows are statistically independent. A Compounding method of obtaining a 
set of h • • • t K n binary digits from the original array is presented. By suitable choices of 
K,t 1, • • • , tK the maximum bias of the compounded set can be made extremely small even 
if the maximum bias of the original array is not small; this can be done so that t\ • • • 

(1 + ti) • • • (1 -f <*) is moderately large. Also a method is outlined for constructing an 
approximately random binary digit table. This table has the property that the maximum 
bias of a set of digits taken from the table is an increasing function of the number of digits 
in the set. 



ABSTRACTS OF PAPERS 


431 


10. A Multiple Decision Problem Arising in the Analysis of Variance. Edward 
Paulson, University of Washington, Seattle. 

In some applications of the analysis of variance, a procedure is required for classifying 
varieties into* superior * and‘inferior’ groups. Consider K varieties, with x< a the a* observa¬ 


tion on the I th variety (a 


- r; % — 1, 2, • • • K), let £* — 2 X{ a /r and let 8* be an 


independent estimate of the variance. For the i th variety form the corresponding interval 

( \8 \8 \ 

- p £i H- p ]. The superior group then consists of the variety with greatest 

Vr VrJ 

sample mean, together with those varieties whose corresponding intervals have at least one 
point in common with the interval for the variety with the greatest mean. If all varieties 
fall into one group, this group is labeled ‘neutral’ and the varieties are considered homoge¬ 
neous. To select X, consider the relative importance of different incorrect classifications. 
For a given X, an explicit expression is found for P(A), the probability the varieties will not 
all be classified in one group when mi = m 2 = • • • = where m, =* E(£ t )] also explicit ex¬ 
pressions are found for P(Bi) and P(B 2 ), where P(Bi) is the probability there will not be a 
superior group consisting only of the Kth variety and P(B 2 ) is the probability there will 
not be a superior group consisting of at least the Kth variety, when mi *» m 2 = • • • = m*_i « 
m and m* = m + A(A > 0). Similar results are obtained for classifying K processes 
according to their variances. 


11. Recurrence Formulae for the Moments and Semi-variants of the Joint 
Distribution of the Sample Mean and Variance. Olav Reiers0l, University 
of Oslo, Norway. 

Let Xi , X 2 , • • • , x n be independent and having the same distribution. We consider the 
arithmetic mean m and the variance v = (l/(n — l)) 2 (x x — m) s . Let K r , denote the semin- 
variants of the joint distribution of m and v , and let the seminvariant generating operators 
K be defined by the equations: K r +i„ — KiK rt , jc r .« + 1 = K 2 K r $ 2C.-.1 = 0,K t (PQ) =* P(KiQ) + 
Q(KiP). An operator which operates only on the first factor of a product shall be 
denoted by a prime, and an operator which operates only on the second factor shall be 
denoted by a double prime. We have the following general formula, valid for any parent 
distribution: K[[(n — 1)(/Ca + — 2 n(K[ -f k[ q ){K[' + k(£)]*[L (koi — niC 2 o) + 

n(«io.itio — l-xio)] = 0. For s =* 0, 1, 2, we obtain the formulae, K[(kq i — mejo) * 0, 
K[[( n ~~ 1)(<02 — n* 2 i) — 2n^ 0 ] 8=1 0, K\[(n — l) 2 (<«os — nun) — 8n 2 (n — l)« 2 HC 2 o + 4n*(n — 1) 
k\ 0 - 8n*(n - l)icj 0 ] = 0. 

12. The Problem of Identification in Factor Analysis. Olav Reiers0l, Uni¬ 
versity of Oslo, Norway. 


The paper is concerned with the multiple factor analysis of L. L. Thurstonc. Thurstone 
has given criteria which he says are almost certain to constitute sufficient and more than 
necessary conditions for uniqueness (i.e. identifiability) of a simple structure. It is shown 
that Thurstone’s criteria are not always sufficient, and conditions are derived which are 
more nearly necessary and sufficient for the identifiability of a simple structure. Let A 
be the matrix of factor loadings with n rows and r columns. When the communalities are 
identifiable, the conditions will be: (1) Each column of A should have at least r zeros. 
(2) Let us consider the submatrix B of A, consisting of all the rows which have zeros in the 
kth column. Then, for q — 1,2, • • •, r — 1, there should for any combination of q columns 
different from the kth f exist at least q + 1 rows of B containing non-zero elements in the q 
columns. This should be true for any value of k. 



432 


ABSTRACTS OP PAPERS 


13. Note on Distinct Hypotheses. Agnes Berger, Columbia University, New 
York. 

As was pointed out by Neyman, one of the difficulties which one may encounter when 
devising a test to distinguish between two exhaustive and exclusive composite hypotheses 
referring to the unknown distribution of a random vector X is the following: If Ho states 
that the true distribution function of X belongs to a set [F\ and Hi that it belongs to a 
set {G}, it may happen that to every Borel set W of the sample space there exists an element 
Fwin {F| and an element Ow in {(7} for which the probability of the sample point z falling 
on W is the same and therefore independent of whether H 0 or Hi is true. If this is the case 
the pair Ho , Hi is called non-distinct, otherwise they are called distinct. The existence of 
non-distinct hypotheses is demonstrated by a simple example, Ho consisting of one, Hi 
of three suitably chosen stepfunctions. It is shown however that if the sets [F | and (G) 
contain only continuous distribution functions and are at most enumerable then tho pair 
Ho , Hi is distinct. Necessary and sufficient conditions for Ho and Hi to be distinct were 
obtained jointly with Wald for an important class of hypotheses each containing a con¬ 
tinuum of alternatives. 

14. Place of Statistical Sampling in the Education of Engineers. E. L. Grant, 
Stanford University. 

There is convincing evidence that many engineering problems could be solved better 
with the aid of statistical methods than they are now solved without this aid. However, 
few practising engineers or teachers of engineering have had any training in statistical 
methods. As a result, those engineering problems which are in part statistical problems 
are seldom recognized as such. Even in the field of industrial quality control, in which 
successful applications of some of the simpler statistical techniques have been made in 
many different industries, the surface has barely been scratched and a serious obstacle to 
progress is the lack of a widespread appreciation of the statistics point of view among 
design engineers, production engineers, inspection personnel, and management. 

This condition might gradually be corrected if during the next few years instruction in 
statistics should be introduced into all undergraduate engineering curricula. However, 
some recent discussions touching on the subject of statistics instruction for engineering 
students (e.g., the report on “The Teaching of Statistics** which appeared in the March 
1948 issue of the Annals of Math. Stat.) have been most unrealistic regarding the amount of 
statistics instruction which could be added to engineering curricula. These discussions 
have suggested a full year of basic statistics followed by one or more courses in engineering 
applications. Desirable as this arrangement might be from the point of view of the most 
effective instruction in statistics, it is out of the question when considered in the light of 
the many subjects which are needed in engineering curricula. Although undergraduate 
engineering curricula have always been tighter than other curricula, the pressures today 
are greater than ever before—for more time devoted to the humanistic-social stem, for 
more time in basic mathematics and science, for introductory courses in various economic 
and management subjects such as engineering economy, accounting, industrial relations, 
business law, and industrial organization and management, and for more time in the various 
departmental courses in engineering subjects in order to permit presentation of important 
recent developments in engineering technology. Under these circumstances the most that 
can be hoped for in the undergraduate program is a single statistics course for one term, 
possibly three units for one semester or four units for one quarter. This should be supple¬ 
mented by additional statistics instruction for some graduate students in engineering. 
A few engineering graduates should be encouraged to take graduate degrees in statistics 
and to make careers in the field of applied statistics. 



ABSTRACTS OF PAPERS 


433 


In a successful undergraduate statistics course for engineering students, the problems 
and illustrations should be selected with two purposes in mind. One purpose, of course, 
should be to develop the principles of probability and statistical method. The other, 
equally important if these engineering graduates are to persuade their colleagues and 
superiors to adopt the statistics point of view in approaching engineering problems, should 
be to demonstrate how statistical method provides a useful guide to action in many different 
engineering situations. Applications of statistics to industrial quality control provide 
particularly good problems and illustrative examples which serve this second purpose. 

15. Statistical Problems of Medical Diagnosis. Jerzy Neyman, University of 
California, Berkeley. 

“Diagnosis” is used to describe the outcome of a strictly defined test T, such as Wasser- 
mann test, which may lead to either of two possible outcomes, “positive” or “negative”. 
Cases contemplated are such that at the time the test T is performed it is impossible to 
verify its verdict for certain and the best one can do is to repeat the test. It is postulated 
that to each individual of a population there corresponds a probability p that the test T 
will give a positive outcome. The value of p may vary from one individual to another. 
It is presumed that as p increases, the illness in the patient increases. Problem of compari¬ 
son of two alternative tests and problem of estimating the distribution of p reduces to 
problems relating to the distribution of X = number of positive outcomes in n independent 
diagnoses. Statistical machinery suggested is that of BAN estimates ( Public Health 
Report , Vol. 62, (1947), p. 1449). Principal result reported is that, with the mathematical 
model used in the paper quoted, the empirical variances of four BAN estimates computed 
for 205 samples of 1000 elements each agreed reasonably with the theoretical asymptotic 
values. Empirical distributions of three of these estimates did not show deviations from 
normality. That of the fourth was non-normal. It seems therefore that the asymptotic 
procedure of BAN estimate may be adequate for similar analyses. 

16. Power of Certain Tests Relating to Medical Diagnosis. C. L. Chiang and 
J. L. Hodges, Jr., University of California, Berkeley. 

Associate with each individual in a population t the probability p that he will be found 
tubercular when examined by a standard X-ray technique. Yerushalmy and others [J. Am. 
Med. Aem.y Vol. 133, (1947), p. 359] performed 5 independent such diagnoses on each of 1256 
persons. Neyman [Public Health Reports , Vol. 62, (1947), p. 1449] proposed a simple four- 
parameter model for the distribution of p in 7r, estimated the parameters from the data of 
Yerushalmy and others, and obtained a satisfactory fit. In the present paper, the work of 
Neyman is paralleled with four new models, all giving satisfactory fit with the same data. 
The five models differ considerably in shape, and in the number of repeated diagnoses which 
they indicate to be necessary to detect a high proportion of those individuals having, say, 
p ^ 0.1. Therefore further preliminary study seems indicated before one can design a mass 
survey to detect a high proportion of such persons. The approximate power of the x 2 test 
of the Neyman model is considered, using one of the other models as alternative. It is 
found that to obtain power 0.7 with level of significance 0.05, it would be necessary to diag¬ 
nose 5290 individuals 5 times each. 

17. Iterative Treatment of Continuous Birth Processes. T. E. Harris, 
Project Rand, Santa Monica, California. 

Random variables z n are defined by zq « 1 ; P{z\ =■ r) * p r , r * 1 , 2 , • • • ; if z n - k, 

oo 

z n +i is the sum of k independent variates, each distributed like zi . Let x « 2 rp r < « ; 



434 


ABSTRACTS OP PAPERS 


2 r*p r < 00 ; 0 < pi < 1. The generating function /(s) — 2 prS r is said to be C.I. if there 

exists a family of generating functions/(«, t) with/(a, 1 ) * /(*),/[/($, t), t'] * /(a, tt') for all 
nonnegative t and t'. A necessary and sufficient condition that /(a) be C.I. is that the 
numbers a r , r — 2 ,3, • • • , be nonnegative; the a r are determined recursively by requiring 


that the power series £(s) — —a + 2 cir8 r satisfy formally the functional equation £(s)/'(«) ■■ 

£[/(«)]. The problem is connected with classical works on iteration. If /(a) is C.I., the 
given Markoff process can be imbedded in a continuous birth process. If £(s) is given, the 
m.g.f. 0 (a) of the asymptotic distribution of the variate z n /x n may be determined from the 

_{(»> 

responding distribution can be inferred from this expression. 


formula “ (a — 1 ) exp 


iisiriouuc 

m 


Various properties of the cor- 


18. Estimation of Means on the Basis of Preliminary Tests of Significance. 
Blair M. Bennett, University of California, Berkeley. 


This paper examines the statistical procedure of pooling two sample means on the basis 
of the results of one or more preliminary tests of significance. Let *<,(»*■ 1, •• • , N\), 
represent a sample of Ni observations from a normal population i?i(£, a x ), and j/» a sample of 
Nx observations from 172 ( 17 , 02 ). An estimate of £ which is commonly used in certain practical 
situations is given by: s' =* £, or x' = (Ntf -+* N 2 $)/(Ni + Nx), according as the sample 
means £, y do or do not differ significantly on the basis of a preliminary test. The distribu¬ 
tion of the estimate x' is determined, according as <ri = <r 2 are known or unknown. In both 
situations, the maximum (or minimum) bias is computed as a function of various levels of 
significance of the preliminary test of equality of means. Also, the mean square error of 
the estimate a:'is calculated in both cases. If now equality of variances cannot be assumed, 
but an F-test of the sample variances 8 X , s a does not indicate any significant difference, 
then in practice £, y may be pooled, the weights being inversely proportional to the sample 
variances. Thus, the usual estimate of £ will be of the form: x' — £, or x* « (N\£/s x -f 
Nij}/S\)/(Ni/Sl + Nx/S\), according as £ and # do or do not differ significantly on the 
basis of the Student $-test, subsequent to an F-test. The bias and mean square error of this 
estimate have been computed with the aid of the conditional power function of the J-test 
subsequent to an F-test. 

19. Note on Power of the F Test. Stanley W. Nash, University of California* 
Berkeley. 


Assuming “treatment” expectations to be normal random variables, the ratio of the 
sum of squares due to treatments to the sum of squares due to error has a central F dis¬ 
tribution in the cases of randomized blocks, Latin squares, and one-way classifications. 
The F statistic converges in probability to a constant as the number of treatments is in¬ 
creased. This is one plus a multiple of the variance between treatment expectations. 
The power of the F test increases monotonely to one as the number of treatments is in¬ 
creased. This power can be calculated using tables of the incomplete beta function. 

20 . Best Asymptotically Normal Estimates. E. W. Barankin and J. 
Gurland, University of California, Berkeley. 

The methods of minimum x* developed by Neyman for obtaining BAN (best asymp¬ 
totically normal) estimates of the parameters appearing in the multinomial distribution 



ABSTRACTS OF PAPERS 


435 


are generalized to obtain certain optimum types of estimates in the case of an arbitrary 
distribution under certain restrictions. Let the random vector X have the probability 
density v(x; $) in the absolutely continuous case and let v(x\ 0) — P[X — x/0\ in the discrete 
case, where S is a fixed vector in the parameter space. Functions (i «■» 1,2, • • • , r) 

are selected for the purpose of forming estimates; these estimates are taken to be functions 
of the sample moments ~ T*!?- t 6i(xA. Certain quadratic forms which depend on the 
choice of functions 4>i(X), 4> 2 (X), • • • , 0 f (X) are minimized with respect to the parameters. 
In this manner, asymptotically normal estimates are obtained which are consistent, and 
have minimum asymptotic variances within the class of estimates so determined by the 
particular functions <t>i, fa, • • • , 4>r . It is possible, through a modification of this proce¬ 
dure, to obtain estimates by solving a set of linear equations. If v(x ; 0) has the form 

« 

v(x ; e) - exp {£<,(0) + ^/3 t (0)t/»(z) + y 0 (x)) 

it can be shown that the best choice of the <f>’a is yi(x ), yz(x), • • • , y$(x). Maximum likeli¬ 
hood estimates belong to this class of BAN estimates. 



BOOK REVIEW 


The Theory of Games and Economic Behavior John von Neumann and Oskar 
Morgenstern. Princeton University Press, 1947; Second Edition, Pp. xviii, 
641. $10.00 


Reviewed by Leonid Hurwicz 1 
Iowa State College 

This review is devoted to the second edition of a book which from its first 
appearance was acknowledged to be a major contribution in the field of theory 
of rational behavior. As is pointed out in the Preface, “the second edition 
differs from the first in some minor respects only”. The main change is the 
addition of a proof (of “measurability” of utility) omitted in the first edition. 

The book’s objective is to solve the problem of rational behavior in a very 
general type of situation. 

It is, therefore, not surprising that its results are of relevance in many fields 
of knowledge, among them economics and statistical inference. 

In both economics and statistics the problem of rational behavior is a funda¬ 
mental one. Thus one of the classical problems treated by the economic theory 
is that of profit maximization by a firm. The firm is assumed to be maximizing 
its net profit which is a function of prices of the product, materials used, etc., as 
well as the quantities used and produced. In the simplest case prices are taken 
as given; more generally they are assumed to be functions (known to the firm) 
of the quantities sold and purchased. But assuming this function to be known 
presupposes the knowledge of behavior of other firms. This procedure has for 
a long time been regarded as highly unsatisfactory; it is analogous to elaborating 
the theory of rational behavior of a poker player on the assumption that he knows 
the strategy of the other players! 

It is the type of situation in which not only the behavior of various individ¬ 
uals, but even their strategies, are interdependent, that is treated by von Neu¬ 
mann and Morgenstern. The essence of their solutions is to base the optimal 
strategy on the minimax 'principle. As applied to a game, the principle re¬ 
quires that one should choose a strategy which minimizes the maximum loss 
that could be inflicted by the opponent. 

The minimax principle, when applied by both players need not, in general, 
lead to a stable solution. To ensure the existence of such a solution the authors 
are led to the postulate that the choice of strategies be made through a random 
process. The minimax to be found is that of the mathematical expectation oi 
the loss in the game. The latter postulate is of a restrictive nature* since it 
implies that the game is played for numerical (“measurable”) stakes and that 

1 On leave to the United Nations Economic Commission for Europe. 

* See Jacob C. Marschak, “Neumann’s and Morgenstern’s New Approach to Static 
Economics”, The Journal of Political Economy , Vol. LIV (1946). 

436 



BOOK REVIEW 


437 * 


the second and higher moments of the probability distribution of the losses are 
immaterial. This restriction, however, has permitted the authors to go deeper 
in other directions. Given the great complexity of the problem, even in its 
restricted version, the authors’ decision can hardly be criticized. One could 
only wish that similar considerations had made the authors more tolerant towards 
other work in the field of economics than is shown in some sections of the book. 

The readers of the Annals will be particularly interested in the connection 
between the Theory of Games and the theory of statistical inference. 

As has been pointed out by Abraham Wald 3 the problem faced by the statisti¬ 
cian is somewhat similar to that of a player in a game of strategy. The theory 
of statistical inference may be viewed as a theory of rational behavior of the 
statistician. His “strategy” consists in adopting an optimal test or estimate, 
more generally an optimal decision function. This optimal decision function 
must be chosen without the knowledge of the “a priori” distribution of the pop¬ 
ulation parameters. Wald’s basic postulate of minimization of maximum risk 
is equivalent to regarding the statistician as a player in a game of strategy, with 
“Nature” as the other player. The optimal decision function is chosen in a 
way which (as shown by Wald) is equivalent to assuming the “least favorable” 
a priori distribution of the parameters. As Wald says, “we cannot say that 
Nature wants to maximize [the statistician’s risk]. However, if the statistician 
is completely ignorant as to Nature’s choice, it is perhaps not unreasonable to 
base the theory of a proper choice of [the decision function] on the assumption 
that Nature wants to maximize (the statistician’s risk)”. 

It may be noted, however, that statistical inference, as seen by Wald, is a 
relatively simple game since it involves only two players and is of the zero-sum 
variety. 

The admiring and enthusiastic reception given to the book’s first edition would 
make any further general appraisal somewhat antielimatic. Suffice it to say 
that a good deal of valuable work has already been stimulated by the Theory of 
Games , both in the field of social sciences and in mathematics. 

•Abraham Wald, “Statistical Decision Functions which Minimize the Maximum Risk”, 
Annals of Mathematics , Vol, 46, (1945). 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Paul H. Anderson, formerly an Economist with the War Assets Adminis¬ 
tration, Washington, D. C., has been appointed Professor of Marketing at 
Loyola University, New Orleans, Louisiana. 

Mr. N. H. Carrier has resigned his position with the Mathematical Statistics 
Section, Chief Scientific Advisers Division, Ministry of Works, England to ac¬ 
cept an appointment as Statistician in the General Register Office, Somerset 
House, Strand, London, W. C. 2, England. 

Dr. T. Freeman Cope has been promoted to a full professorship at Queens 
College, Flushing, New York. 

Dr. Wayne W. Gutzman, who was formerly at the Postgraduate School, 
Naval Academy, Annapolis as an Assistant Professor, has accepted a professor¬ 
ship in the Department of Mathematics, University of South Dakota. 

Mr, Elvin A. Hoy has transferred from the position as Chief, Statistics Sec¬ 
tion, Bureau of Research and Statistics in the Social Security Administration to 
the position as Chief, Research Evaluation Section, Naval Reserve Training 
Publications, Navy Department, Naval Gun Factory, Washington, D. C. 

Dr. Joe J. Livers has been promoted to a full professorship at Montana State 
College, Bozeman, Montana. 

Professor Ernest S. Keeping has returned to his position at the University of 
Alberta, Edmonton, Alberta, Canada after having spent the spring term of 1948 
at the Institute of North Carolina. 

Mr. Wharton F. Keppler of the M&R Dietetic Laboratories, Inc., Columbus, 
Ohio has recently qualified as a Professional Industrial Engineer in the State of 
Ohio. 

Mr. Ralph Mansfield has formed his own company to manufacture electrical 
testing equipment. The company is known as the Auto-Test, Incorporated, 
with Mr. Mansfield acting as Vice-president and Chief Engineer. 

Mr. Jack Moshman has resigned an instructorship in mathematics at the 
University of Tennessee to accept the post of Statistician to the Medical Advisor, 
United States Atomic Energy Commission, Oak Ridge, Tennessee. 

Mr. Bernard E. Phillips has resigned his position as teacher of mathematics 
in the Newark, New Jersey high schools to do statistical work for the Glenn L. 
Martin Co., Baltimore, Maryland. 

Dr. W. R. Van Voorhis, Associate Professor of Mathematics, Fenn College, 
attended, as a representative of the Institute of Mathematical Statistics, the 
inauguration ceremonies of Dr. Keith Glennan as President of Case Institute of 
Technology, Cleveland, Ohio. 


438 



NEWS AND NOTICES 


439 # 


Atomic Energy Commission Fellowships 

The National Research Council is announcing a new program of fellowships 
supported by funds provided by the Atomic Energy Commission as a part of the 
Commission’s responsibility for future atomic energy research. Accordingly, 
these fellowships will be awarded in the many fields of science related to research 
in atomic energy. 

A considerable number of these fellowships is available to young men and 
women who wish to continue in graduate training or research for the doctorate 
in an appropriate field of science. Others of these fellowships will provide train¬ 
ing in biophysics applied to the control of radiation hazards. An additional 
number of fellowships will be assigned to those below the age of 35 who have 
already achieved the doctorate and who wish to secure advanced research train¬ 
ing and experience in those aspects of the physical, biological and medical 
sciences related to atomic energy. Tenure of the fellowship does not impose on 
the fellow any commitment with regard to subsequent employment. 

The candidates will be selected by the fellowship boards of the National Re¬ 
search Council established for this program. In the postdoctoral field, there 
will be three groups of fellowships, the basic stipend of which will be $3000. For 
the selection of fellows for advanced research and training in the general field of 
the physical sciences, a board has been established with Dr. Roger Adams, 
Professor of Chemistry, University of Illinois, as chairman. In the general 
field of the biological sciences, exclusive of the medical sciences, selections of 
postdoctoral fellows will be made by a board under the chairmanship of Dr. R. 
G. Gustavson, Chancellor of the University of Nebraska. For the selection of 
postdoctoral fellows in the medical sciences, a board has been set up under the 
chairmanship of Dr. Homer W. Smith, Professor of Physiology, College of 
Medicine, New York University. 

The program provides for two groups of fellows in the predoctoral field, with 
stipends ranging from $1500-2100. One group of fellows will work in the bio¬ 
logical and basic medical sciences including applied biophysics related to the 
measurement and control of radiation hazards and the effect of radiation upon 
health. Selections will be made by a board under the chairmanship of Dr. 
Douglas Whitaker, Professor of Biology, and Dean of the School of Biological 
Sciences, Stanford University. Another group of predoctoral fellows will be 
selected for study and research in the general field of the physical sciences. 
Selections will be made by a board under the chairmanship of Dr. Henry A. 
Barton, Director of the American Institute of Physics. 

Fellowships will be granted for study and research in universities or other 
nonprofit research establishments approved by the fellowship boards. Awards 
will be made for the academic year 1948-49. Supervision of a fellow’s program 
of work will be under the direction of the fellowship boards of the National 
Research Council. Further information can be secured by writing to the 
Fellowship Office, National Research Council, 2101 Constitution Avenue, Wash¬ 
ington 25, D. C. 



440 


NEWS AND NOTICES 


Research Fellowships in Psychometrics 

The Educational Testing Service, Princeton, N. J., is offering for 1949-50 its 
second series of research fellowships in psychometrics leading to the Ph.D. 
degree at Princeton University. Open to men who are acceptable to the Gradu¬ 
ate School of the University, the two fellowships carry a stipend of $2,200 a 
year and are normally renewable. 

Fellows will be engaged in part-time research in the general area of psycho¬ 
logical measurement at the offices of the Educational Testing Service and will, 
in addition, carry a normal program of studies in the Graduate School. Com¬ 
petence in mathematics and psychology is a prerequisite for obtaining these 
fellowships. Information and application blanks may be obtained from: 
Director of Psychometric Fellowship Program, Educational Testing Service, 
Box 592, Princeton, N. J. 


Preliminary Actuarial Examinations 
Prize Awards 


The winners of the prize awards offered by the Actuarial Society of America 
and the American Institute of Actuaries to the nine undergraduates ranking 
highest on the combined score on Part 1 and Part 2 of the 1948 Preliminary 
Actuarial Examinations are as follows: 


First Prize of &200 
Edward H. Larson 
Additional Prizes of $100 
John E Brownlee 
William L Farmer 
Joseph P. Fennell 
Bert F. Green, Jr 
Solomon Leader 
Felix A E Pirani 
Richard J. Semple 
Charles A Yardlcy 


Massachusetts Institute of Technology 


Haver ford College 
University of Alabama 
Pnnceton University 
Yale University 
Rutgers University 
University of Western Ontario 
University of Toronto 
Dartmouth College 


The two actuarial organizations have authorized a similar set of nine prize 
awards for the 1949 Examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exam¬ 
inations: 


Part 1. Language Aptitude Examination. 

(Reading comprehension, meaning of words and word relationships, antonyms, and 
verbal reasoning.) 

Part 2. General Mathematics Examination. 

(Algebra, trigonometry, coordinate geometry, differential and integral calculus.) 
Part 3. Special Mathematics Examination. 

(Finite differences, probability and statistics ) 



NEWS AND NOTICES 


441 


The 1949 Preliminary actuarial Examinations will be administered by the 
Educational Testing Service at centers throughout the United States and 
Canada on May 13-14, 1949. The closing date for applications is March 15, 
1949. 

Detailed information concerning the Examinations can be obtained from either 
of the following organizations: 

American Institute of Actuaries, 

135 South LaSalle Street, 

Chicago 3, Illinois. 

The Actuarial Society of America, 

393 Seventh Avenue, 

New York 1, New York. 


New Members 

The following persons have been elected to membership in the Institute 
(March 1 to May 31, 1948) 

Alder, Arthur, Ph.D. (Univ. of Berne) Professor of Actuarial Science, University of Berne, 
Schlaeflistrasse 2, Berne, Switzerland. 

Andrews, Fred C., B.S. (Univ. of Washington) Research Fellow, Department of Mathe¬ 
matics, University of Washington, 141 Savery Hall, University of Washington , Seattle, 
Washington. 

Archer, John, Actuary, Pensions Section, Lever Brothers and Unilever Ltd., 5A Spencer 
Hill, Wimbledon, S. W. 19, England. 

Benitz, Paul A., M.A. (Stanford Univ.) 173 Serpentine Road, Tenafly, New Jersey. 

Bennett, George K., Ph.D. (Yale) President of the Psychological Corporation, 522 Fifth 
Avenue, New York 18, New York. 

Berrettoni, J. N., Ph.D. (Univ. of Minnesota) Professional Consultant in Statistics and 
Economics, 632 Erie St., S. E., Minneapolis 14, Minnesota. 

Birnbaum, Allan, A.B. (Univ. of Calif., Los Angeles) Teaching Assistant, Mathematical 
Statistics Department, Columbia University, 500 Riverside Drive, Room 454, New 
York 27, New York. 

Blank, Mark, M.A. (Univ. of Pennsylvania) Instructor of Philosophy, University of Penn¬ 
sylvania, 223 E. Sedgwick , Philadelphia, Pa. 

Blumen, Isadora, B.A. (Univ. of Minn.) Student, Department of Mathematical Statistics, 
University of North Carolina, Chapel Hill, North Carolina. 

Burdick, Wayne E., M.A. (Univ. of Mich.) Student, University of Michigan, 314 S. Fifth 
Avenue, Ann Arbor, Michigan. 

Chaturvedi, Jagdish C., M.Sc. (Agra Univ., India) Lecturer in Statistics, St. John’s College, 
37, Delhi Gate, Agra, U.P., India. 

Cote, Louis J., A.M. (Univ. of Mich.) Student, University of Michigan, 315 North State 
Street, Ann Arbor, Michigan. 

Dunleavy, Mary, A.B. (Hunter College, New York) Statistician, E. I. Dupont de Nemours, 
667 Second Avenue, New York 16, New York. 

Ferber, Robert, M.A. (Univ. of Chicago) Student, University of Chicago, 54 West 89th Street, 
New York 24, New York. 

Forman, John W., M.S. (Univ. of Iowa) Graduate Assistant, Department of Mathematics. 
State University of Iowa, Iowa City, Iowa. 



442 


NEWS AND NOTICES 


Franklin, Nathan M., M.S. (Univ. of Mich.) Student, Univ. of Michigan, Box 195, Moodus , 
Connecticut. 

Fraaer, Donald A. S., M.A. (Univ. of Toronto) Instructor in Statistics, Graduate College, 
Princeton, New Jersey. 

Grabowski, Edwin F., A.B. (George Washington Univ.) Student, George Washington Uni¬ 
versity, 1880-80th Street , N.W., Washington , D. C. 

Healy, William C. f Jr., B.S.E. (Univ. of Mich.) Student, University of Michigan, 689 Lin¬ 
coln , Crosse Pointe y Michigan. 

Heimdahl, Olaf E. W., A.B. (Luther College, Washington) Teaching Fellow, Department of 
Mathematics, University of Washington, 4886 Union Bay Lane, Seattle 5, Washington. 

Henriksen, Robert O., B.Sc. (Univ. of Mich.) Student, University of Michigan, 761 Clancy 
Avenue , Grand Rapids, Michigan. 

Howard, William G., B.S. (Western Carolina Teachers College, Cullowhee, N. C.) Student, 
Institute of Statistics, University of North Carolina, Route 1, Morrisville , North 
Carolina. 

Irick, Paul E«, M.S. (Purdue Univ.) Mathematics Instructor, Purdue University, 729 
North Grant St., West Lafayette, Indiana. 

Johnson, Elgy S., M.A. (Univ. of Mich.) Student, University of Michigan, 18907 Lincoln 
Street , Detroit 5, Michigan. 

Kaplan, E. L., B.S. (Carnegie Inst, of Tech.) Mathematician, Naval Ordnance Laboratory, 
1427 N. St., N. W., Washington 6, D. C. 

Kaufman, Arthur, M.A. (Columbia Univ.) Student and Lecturer of Mathematics, Columbia 
University, 1280 Sheridan Avenue, New York 66, New York. 

Link, Richard F., B.S. (Univ. of Oregon) 750 W. Sixth St., Eugene, Oregon. 

Marks, Charles L., M.A. (Univ. of North Carolina) Instructor of Mathematics, University 
of North Carolina, 218 Mangum Dormitory, University of North Carolina, Chapel Hill, 
North Carolina. 

Marquardt, Mary, M.A. (Univ. of Illinois) Assistant Professor of Statistics, New York State 
School of Industrial and Labor Relations, Cornell University, Ithaca, New York. 

Mickey, Max R., Jr., B.S. (Virginia Polytechnic Institute) Graduate Student and Graduate 
Assistant, Department of Mathematics, Iowa State College, 706 Ash Avenue, Ames, 
Iowa. ' 

Mindlin, Albert, B.A. (Univ. of California, Los Angeles) Teaching Assistant, Mathematics 
Department, University of California, 2444 Carlston Street, Berkeley 4, California. 

Morris, William S., A.B. (Princeton) Statistician, First Boston Corporation, 100 Broadway, 
New York 5, New York. 

Netzorg, Morton J., Engineer, Development Tire Engineering Department, U. S. Rubber 
Co., Detroit, Michigan, 2528 Gladstone, Detroit 6, Michigan. 

Norton, James A., Jr., A.B. (Antioch College) Graduate Research Assistant, Veterans 
Guidance Center, Purdue University, West Lafayette, Indiana. 

Perrin, John K., A.B. (Columbia College) Assistant Statistician, American Telephone & 
Telegraph Co., 195 Broadway, New York 7, New York. 

Perry, Norman C., M.A. (Univ. of Southern Calif.) Lecturer in Mathematics, Mathematics 
Department, University of Southern California, Los Angeles, California. 

Powell, Charles Jr., Actuary, Coates and Herfurth, Consulting Actuaries, 116 S. Virgil 
Avenue, Los Angeles 4, California. 

Raiffa, Howard, B.S. (Univ. of Mich.) Student, University of Michigan, 1447 Enfield Court , 
Willow Run Village , Michigan. 

Raup, Joan E., B.A. (Barnard College) Statistical Analyst, Bureau of the Budget, I486 N. 
Street , N. W., Washington 6, D. C. 

Rubinstein, David, B.S. (Univ. of Wash.) Research Assistant, Statistical Laboratory, Uni¬ 
versity of California, 2216 Parker Street, Berkeley 4 , California. 



NEWS AND NOTICES 


443 


Schlenz, John W., B.S. (Baldwin-Wallace College) Student, University of Michigan, 8306 
Vineyard Avenue , Cleveland 5, Ohio. 

Scott, Elizabeth L., A.B. (Univ. of California) Research Assistant, Statistical Laboratory, 
Department of Mathematics, University of California, Berkeley 4, California. 

Seidman, Herbert, B.A. (Brooklyn College) Junior Statistician, Chief, Statistical Informa¬ 
tion Section, New York University and Student, New York University, 3170 New 
York Avenue , Brooklyn 10, New York. 

Shaw, Oliver A., B.A. (Univ. of Mississippi) U.S. Air Force, 6481 Brooke Lane, N. W ., 
Washington , D. C. 

Shellard, Gordon D., B.S. (Mass. Institute of Tech.) Assistant Section Head, Underwriting 
Studies Section, Actuarial Division, Metropolitan Life Insurance Co., 430 Mountain 
Avenue, Ridgewood, New Jersey . 

Shepherd, Clarence M., M.S. (Case Institute of Tech.) Electrochemical Research Chemist, 
8950 Nichole Avenue , S. W., Washington , D. C. 

Shrikhande, Sharad-Chandra S., B.Sc. (Nagpur Univ., India) Graduate student, Depart¬ 
ment of Mathematical Statistics, University of North Carolina, Chapel Hill, North 
Carolina. 

Sirlin, Robert, M.A. (Columbia Univ.) Statistician, Financial Analysis, 8046 East 83rd 
Street, Brooklyn 89, New York. 

Stacy, Edney W., A.B. (Univ. of North Carolina) Instructor of Mathematics, University 
of North Carolina, SOI W. Franklin Street, Chanel Hill, North Carolina. 

Sternhell, Charles M., B.S. (College of City of N. Y.) Section Head, Actuarial Division, 
Metropolitan Life Insurance Co., 1 Madison Avenue, New York City, New York. 

Tang, Pei-Ching, Ph.D. (Univ. College, London Univ.) Professor, National Central Uni¬ 
versity, Nanking, China. 

Whitson, Milo E., A.M. (Geo. Peabody College, Nashville) Head of Mathematics Depart¬ 
ment, California State Polytechnic College, 583 Lawrence Dr., San Luis Obispo, 
California. 

Watson, Geoffrey S., B.A. (Univ. of Melbourne) Student, Institute of Statistics, State 
College, Raleigh, North Carolina. 

Woolsey, Theodore D., B.A. (Yale Univ.) Biostatistician, Division of Public Health Meth¬ 
ods, U. S. Public Health Service, 111 West Underwood St., Chevy Chase 15, Maryland. 

Wymer, John P., M.A. (Univ. of California, Berkeley) Statistician, U. S. Bureau of Labor 
Statistics, 719 Whittier St., N.W ., Washington 18, D. C. 

Yerushalmy, Jacob, Ph.D. (Johns Hopkins Univ.) Professor of Biostatistics, School of 
Public Health, University of California, Berkeley 4, California. 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The Thirty-fourth Meeting and the Third Regional West Coast Meeting of 
the Institute of Mathematical Statistics was held on the Berkeley Campus of the 
University of California June 22nd through June 24th, 1948, in conjunction 
with the Twenty-ninth Annual Meeting of the Pacific Division of the American 
Association for the Advancement of Science. During the meeting 115 persons 
registered, including the following members of the Institute: 

G. A. Baker, Blair M. Bennett, Carl A. Bennett, Z. Wm. Birnbaum, David Blackwell, 
Albert H. Bowker, George W. Brown, A. George Carlton, Douglas G. Chapman, Andrew G. 
Clark, Edwin L. Crow, Dorothy Cruden, Harold Davis, It. C. Davis, W. J. Dixon, Robert 
Dorfman, George Eldrcdge, Lillian Elveback, Mary Elvebaek, Benjamin Epstein, M. W. 
Eudey, Evelyn Fix, Merrill M. Flood, H. H. Germond, Meyer A. Girshick, Eugene L. 
Grant, JohnGurland, T. K. Harris, J. L. Hodges, Jr., Paul G. Hoel, John M. Howell, Harry 
M. Hughes, Leo Katz, H. S. Konijn, T. C. Koopmans, George W. Kuznets, E. L. Lehmann, 
Richard F. Link, A. M. Mood, Stanley W. Nash, J. Neyman, Stefan Peters, G. Baley Price, 
Kathryn B. Rolfe, Leonard J. Savage, Henry Scheflte, Howard L. Schug, Elizabeth L. 
Scott, Esther Seidcn, Milton Sobel, Zenon Szatrowski, John W. Tukey, J. R. Vatnsdal, 
A. Wald, John E. Walsh, Holbrook Working, Zivia S. Wurtele. 

The Tuesday morning session was devoted to a symposium on Mathematical 
Theory of Games with Professor G. C. Evans of the University of California, 
Berkeley, as chairman. Addresses were: 

1. Survey of von Neumann's mathematical theory of games. J. C. C. McKinsey, Project 
Rand. 

2. Recent developments in the mathematical theory of games. Olaf Helmer, Project Rand. 

3. Applications of theory of games to statistics. Abraham Wald, Columbia University. 

4. On continuous games. Henri F. Bohnenblust, California Institute of Technology. 

5. Discussion. Edward W. Barankin, University of California, Berkeley. 

The attendance was approximately 130. 

The Tuesday afternoon session was under the chairmanship of Professor Henri 
F. Bohnenblust of the California Institute of Technology. The invited address, 
Complete Classes of Statistical Decision Functions , by Professor Abraham Wald 
was followed by tea in Senior Women’s Hall and then the following contributed 
papers: 

1. Identification as a problem of inference. T. C. Koopmans, Cowles Commission for 
Research in Economics. 

Discussion : Olav Reierspl, University of Oslo. 

2. Estimation of parameters for truncated multinormal distributions. Z. W. Birnbaum, 
E. Paulson and F. C. Andrews, University of Washington. 

3. A test of the hypothesis that a sample of three came from the same normal distribution. 
Carl A. Bennett, General Electric Company. 

4. A Note on the application of the abbreviated Doolittle solution to nonorthogonal analysis 
of variance and covariance. (By title.) Carl A. Bennett, General Eleetric Company. 

The attendance was between 100 and 150 during the afternoon. 

444 



REPORT ON BERKELEY MEETING 


445 


The Wednesday morning session was devoted to a symposium on Design of 
Experiments with Particular Reference to Agricultural Trials. Dean A. R. 
Davis of the University of California, Berkeley, presided briefly and then 
Professor Abraham Wald took over the duties of chairman. The papers were: 

1 . Recent advances in experimental design. R. C. Bose, University of Calcutta. 

2. Yield trials with backcrossed derived lines of wheat. G. A. Baker and F. N. Briggs, 
University of California at Davis. 

3. Selecting subset which includes the largest of a number of means. Charles Stein, Uni¬ 
versity of California, Berkeley. 

4. Discussion. A. G. Clark, Colorado State College; S. W. Nash, University of Cali¬ 
fornia, Berkeley; J. R. Vatnsdal, State College of Washington. 

5. The effect of inbreeding on height at withers in a herd of Jersey cattle. W. C. Rollins, 
S. W. Mead and W. M. Regan, University of California at Davis. 

Attendance was about 100. 

The afternoon session, under the chairmanship of Professor George Pdlya of 
Stanford University, began with an invited address by Professor Michel Lofeve, 
University of California, Berkeley, on Random Functions and Related Problems. 
This was followed by the contributed papers: 

1. An example of a singular continuous distribution. (By title.) Henry Scheflte, Uni¬ 
versity of California at Los Angeles. 

2. On the theory of some nonparametric hypotheses. E. L. Lehmann and Charles Stein, 
University of California, Berkeley. 

3. Compound randomization in the binary system. John E. Walsh, Project Rand. 

4. A multiple decision problem arising in the analysis of variance. Edward Paulson, 
University of Washington. 

5. Recurrence formulae for the moments and seminvariants of the joint distribution of the 
sample mean and variance. Olav lleierspl, University of Oslo. 

6. Identification problem in factor analysis. (By title.) Olav Reiersol, University of 
Oslo. 

7. On distinct hypotheses. Mrs. Agnes Berger, Columbia University. 

The attendance was approximately 100. 

A symposium on Sampling for Industnal Use occupied the Thursday morning 
session. Professor Z. W. Birnbaum of the University of Washington presided. 

1 . Sampling plans for continuous production. M. A. Girshick, Project Rand. 

2. Sampling plans with continuous variables for acceptance inspection. A. L. Bowker, 
Stanford University. 

3. Place of statistical sampling in the education of engineers. E. L. Grant, Stanford 
University. 

4. Discussion. Henry Scheffd, University of California at Los Angeles; Charles Stein, 
University of California, Berkeley; Holbrook Working, Stanford University. 

The attendance was approximately 100. 

The first part of the afternoon session, presided over by Professor W. J. Dixon, 
University of Oregon, was devoted to contributed papers: 

1. Statistical problems of medical diagnosis. Jerzy Neyman, University of California, 
Berkeley. 

Discussion : L. J. Savage, University of Chicago. 



446 


REPORT ON jgftRKELEY MEETING 


2. Power of certain tests relating to medical diagnosis. C. L. Chiang and J. L. Hodges, 
University of California, Berkeley. 

3. On best asymptotically normal estimates. Edward W. Barankin and John Gurland, 
University of California, Berkeley. 

4. Iterative treatment of continuous birth processes. T. E. Harris, Project Rand. 

5. Estimation of means on the basis of preliminary tests of significance. Blair M. Bennett, 
University of California, Berkeley. (By title.) 

The attendance was about 90. 

The second part of the afternoon session was the Business Meeting. Professor 
Abraham Wald, President of the Institute, presided. It was recommended that 
two meetings a year be held on the West Coast, one in June in the San Francisco 
Bay Area alternating between Berkeley and Stanford and the other during the 
winter alternating between the North and Los Angeles. The next West Coast 
meeting will be held during the Thanksgiving recess at Seattle. 



TESTING COMPOUND SYMMETRY IN A NORMAL 
MULTIVARIATE DISTRIBUTION 

By David F. Votaw, Jr. 

Yale University 

Summary. In this paper test criteria are developed for testing hypotheses 
of “compound symmetry” in a normal multivariate population of t variates 
(t > 3) on basis of samples. A feature common to the twelve hypotheses con¬ 
sidered is that the set of t variates is partitioned into mutually exclusive subsets 
of variates. In regard to the partitioning, the twelve hypotheses can be divided 
into two contrasting but very similar types, and the six in one type can be paired 
off in a natural way with the six in the other type. Three of the hypotheses 
within a given type are associated with the case of a single sample and moreover 
are simple modifications of one another; the remaining three are direct extensions 
of the first three, respectively, to the case of k samples (k > 2). The gist of any 
of the hypotheses is indicated in the following statement of one, denoted by 
Hi(mvc): within each subset of variates the means are equal , the variances are equal 
and the covariances are equal and between any two distinct subsets the covariances 
are equal . 

The twelve sample criteria for testing the hypotheses are developed by the 
Neyman-Pearson likelihood-ratio method. The following results are obtained 
for each criterion (assuming that the respective null hypotheses are true) for 
any admissible partition of the t variates into subsets and for any sample size, 
N , for which the criterion’s distribution exists: (i) the exact moments; (ii) an 
identification of the exact distribution as the distribution of a product of inde¬ 
pendent beta variates; (iii) the approximate distribution for large N. Exact 
distributions of the single-sample criteria are given explicitly for special values 
of t and special partitionings. 

Certain psychometric and medical research problems in which hypotheses of 
compound symmetry are relevant are discussed in section 1. Sections 2-6 give 
statements of the hypotheses and an illustration, for Hi(mvc), of the technique 
of obtaining the moments and identifying the distributions. Results for the 
other criteria are given in sections 7-8. Illustrative examples showing appli¬ 
cations of the results arc given in section 9. 

1. Introduction. In studying psychological examinations, or other measuring 
devices, one may wish to test whether several forms of an examination may be 
used interchangeably. Consider the case of three forms, and assume that 
scores of individuals on the three forms have a normal 3-variate distribution. 
The hypothesis of interchangeability is equivalent to the hypothesis that in the 
normal distribution the means are equal, the variances are equal, and the covari¬ 
ances are equal. When this hypothesis is true, the normal distribution is in- 

447 



448 


DAVID F. V0TAW, JR. 


variant over all permutations of the variates and is said to have complete sym¬ 
metry, It is frequently more important, however, not only to test that the forms 
have completely symmetric relations with each other but also that they are inter¬ 
changeable with regard to their relation to some outside criterion measure (e.g., 
the criterion might be skill in a given task). Assuming that the scores of in¬ 
dividuals on the three forms and the criterion have a normal 4-variate distribu¬ 
tion, the hypothesis of interchangeability is equivalent to the hypothesis of 
equality of means, equality of variances, and equality of covariances among the 
three forms and equality of covariances between forms and criterion. When 
this hypothesis is true, the 4-variate normal distribution is invariant over all 
permutations of the three variates (associated with forms) among themselves, 
and so the variance-covariance matrix has the following form: 


A 

c 

c 

c 

C 

B 

D 

D 

C 

D 

B 

D 

C 

D 

D 

B 


where the quantity A represents the variance of the criterion measure. A 
normal distribution for which this hypothesis is true is said to have compound 
symmetry (of type I). A more general case of compound symmetry (of type I) 
arises when there are several examinations (no two of which need have the same 
number of forms) and several outside criteria. 

The hypothesis of complete symmetry may arise in certain medical-research 
problems. For example, suppose a measurement (e.g., %C0 2 in blood) is made 
at each of three times (say T \, T 2 , Tz) on an experimental animal and assume 
that the three quantities have a normal 3-variate distribution; one may then be 
interested in testing the hypothesis of complete symmetry on basis of measure¬ 
ments (considered as a random sample) made on several experimental animals. 
More generally, let there be two characteristics, say U and W (e.g., %C0 2 in 
blood and %02 in blood), which are both measured at each of two times, T \, 
T 2 . Let it be assumed that the four quantities—which we represent as UT \, 
UT 2 , WTi , WT 2 —have a normal 4-variate distribution. One may then be 
interested in testing the hypothesis that the means of the first two variates are 
equal, the means of the second two are equal, and the variance-covariance matrix 
has the form: 


II E 

F 

K 

L 

r 

E 

L 

K 

|| K 

L 

G 

j 

!l L 

K 

J 

G 


When this hypothesis is true, the 4-variate distribution is said to have compound 
symmetry (of type II). A more general case of compound symmetry (of type II) 
arises when there are h characteristics and n times (h, n = 2, 3, • • ■). 







COMPOUND SYMMETRY 


449 


Either of the two types of compound symmetry is a direct extension of complete 
symmetry. Wilks [5] has thoroughly treated the sampling theory of certain 
criteria for testing various hypotheses of complete symmetry regarding a normal 
multivariate distribution. 

The problems dealt with in this paper are: (i) to give sample criteria for 
testing hypotheses of compound symmetry regarding a normal multivariate 
distribution, and (ii) to give the moments and identify the distribution of each 
sample criterion when the corresponding hypothesis is true. 

The hypotheses are stated in section 2. Certain properties of compound sym¬ 
metric normal distributions are given in section 3. Sections 4, 5, and 6 together 
give the method of deriving each sample criterion and the methods of obtaining 
the criterion’s moments and identifying its distribution; the methods are illus¬ 
trated for one of the hypotheses. Sections 7-8 give the other criteria and their 
moments together with approximate distributions of the criteria for large sample 
sizes. Exact distributions of some of the criteria are given in section 7g for 
certain special compound symmetries. Section 9 contains two illustrative 
examples. 

2. Statements of hypotheses. Let n be a normal /-variate population and 
Xi (i = 1, *•*,/)(/> 3) be the i-th variate in II. Let the set of variates Xi , 
Xt, • • • , X t be partitioned into q mutually exclusive subsets of which, say, 
6 subsets contain exactly one variate each and the remaining q — b = h subsets 
(where h > 1) contain n \, n 2 , • • • , n* , variates, respectively, where n a > 2 

h 

(a = 1, • • • , h; b + ^2 n a — t). No generality is lost in assuming that the / 
0*1 

variates are ordered so that the first b belong to the b subsets containing one 
variate each, the next ni variates belong to the (Jb + l)-th subset, • • •, the last n h 
variates to the g-th subset, where n\ < < • • • < n h . Let (l 6 , n \, , • • • f nh) 

represent such a partition of the variates Xi , • • • , X t into subsets; when 6 = 0 
the term l b will be omitted. The notation can be simplified when ni , rc 2 , • • • , 
n h are not all unequal; e.g., (l b , 2, 2) can be written as (l 6 , 2 2 ). 

In the statement of each of the following six hypotheses it is assumed that there 
is a preassigned partition (l 6 , n \, n 2 , • • • , rih) of the / variates into q subsets 
(*-& + *). 

(1) Hi(mvc): The hypothesis that within each subset of variates the means 
are equal, the variances are equal, and the covariances are equal and that be- 
tween any two distinct subsets of variates the covariances are equal. 

(2) Hi(vc ): The hypothesis that within each subset of variates the variances 
are equal and the covariances are equal and that between any two distinct sub¬ 
sets of variates the covariances are equal. 

(3) Hi(m): The hypothesis that within each subset of variates the means are 
equal, given that H\(vc) is true. 

(4) H k (MVC | mvc): the hypothesis that h normal /-variate distributions are 
the same given that they all satisfy H\(mvc) for a particular division of the vari¬ 
ates into subsets (k > 2). 



450 


DAVID F. VOTAW, JR. 


(5) H k (VC | mvc): The hypothesis that k normal J-variate distributions have 
the same variance-covariance matrix, given that they all satisfy H\(mvc) for a 
particular division of the variates into subsets (k > 2). 

(6) Hk(M | mVC ): The hypothesis that k normal f-variate distributions are 
the same, given that they all satisfy Hi(mvc) for a particular division of the 
variates into subsets and that they all have the same variance-covariance matrix 
(k > 2). 

Any of the hypotheses stated above can be expressed in terms of an invariance 
condition on the normal f-variate distribution (or distributions); e.g., H\(mvc) 
is equivalent to the hypothesis that the distribution is invariant over all permuta¬ 
tions of the variates within subsets. The pattern of symmetry present in the 
variance-covariance matrix of the distribution when any of the above six hypoth¬ 
eses is true is given in section 3 (see (3.2)). 

Six additional hypotheses, Bi(mvc ), Bi(vc) y • • • , B k (M \ mVC), which are 
modifications of Hi(mvc), Hi(vc), • • • , H k (M | mVC), respectively, will also be 
considered. In regard to any of these six B hypotheses, it is assumed that there 
is a partition (n h )(n = 2,3, • • •) of the t variates (t — nh) and that in each subset 
the variates are in a given order; thus each subset has n variates and between 
any two distinct subsets of variates there are tl covariances, which form an n X n 
“block” in the variance-covariance matrix of the distribution (see (3.4)). The 
hypotheses may now be stated as follows: 

Bi(mvc): The hypothesis that within each subset of variates the means are 
equal, the variances are equal, and the covariances are equal and that between 
any two distinct subsets of variates the diagonal covariances are equal and the 
off-diagonal covariances are equal. 

Bi(vc): The hypothesis that within each subset of variates the variances are 
equal and the covariances are equal and that between any two distinct subsets 
of variates the diagonal covariances are equal and the off-diagonal covariances are 
equal. 

The statement of any of the hypotheses Bi(m), Bk(MVC | mvc) y B k (VC\mvc) y 
and B k (M | mVC) is obtained from the statement of the corresponding H 
hypothesis by simply substituting B for H. The pattern of symmetry present 
in the variance-covariance matrix of the distribution when any of the six B 
hypotheses is true is given in section 3 (see (3.4)), from which the appropriate 
invariance condition on the normal distribution can be obtained. 

A test of any of the hypotheses Hi(mvc ), B\(mvc) y Hi(vc), Bi(vc), Hi(m) y Bi(m) 
is based on a random sample from a normal multivariate distribution; a test of 
any of the remaining hypotheses is based on k random samples from k normal 
multivariate distributions, respectively, (k > 2). 

A normal distribution for which an H or B hypothesis is true will be called 
compound symmetric. In the special case where the compound symmetry holds 
for a partition (t) of the t variates, any H hypothesis and the B hypothesis 
corresponding to it are identical; in this case the normal distribution will be 
called completely symmetric. Problems (i) and (ii) (see section 1) have been 



COMPOUND SYMMETRY 


451 


solved completely by Wilks [5] for Hi(mvc), H x (vc)> and Hi(m) for the case of 
complete symmetry. 

3. Block symmetric matrices and vectors. Let m, be the mean value of X { 
and || pi,<r % <rj || be the variance-covariance matrix of Xi , • • • , X t (i,j = 1, • • • , /) 
(p t j is the coefficient of correlation between Xi and Xj). The joint probability 
density function 1 of Xi , X 2l • • • , is 

(3.1) ftX x ,X it • • •, X,) = | G<, exp [-£ QtAXt - - m,)], 

where || G ,, || is positive definite and its inverse || G' J || = || 2 pij<r&j ||. 

When any of the H hypotheses is true (see section 2), we represent || G' J || 
by || A' 3 1| (also || G„ || by || A tJ || ) which can be written as (3.2) (see page452), 
where A**' = A*' a (s, s' = 1, • • • , b) and D aa ' - D a '°(a, a' » 1, • * • , h; a o'). 
The A’s and B 's with single superscripts and the C 's and /)’s have been intro¬ 
duced to indicate the block pattern clearly. In general C* a = C** only if 
a = s(s — 1, • • * , b; a = 1, • • • , h). || A „ || and || A' 1 1| have the same 
block pattern of symmetry. 

The blocks in (3.2) are formed by making a partition (1*, n x , w 2 , • • • , n*) of the 
t rows and t columns of 11 A 1 11 . A matrix having the block pattern of sym¬ 
metry of (3.2) will be called block symmetric of type I. Clearly a block symmetric 
matrix of type I is invariant over all permutations of its rows and columns within 
the subsets determined by (l 6 , n x , • • • , n*), if the row interchanges and column 
interchanges are the same. Also, a ^-component vector will be called block 
symmetric if the order of values of the components is invariant over all permuta¬ 
tions of the components within groups determined by (l 6 , n x , • • • , n h ). 

The determinant of the block symmetric matrix || A a || is 

(3.3) | At, I - K n (A. - B.)"*-\ 

where 





Cn 

C» ■ 

■ Cn 


A, 


Cm 

P- * ■ 

• cU 

Cn 

Cn ••• 

Cm 

A[ 

D'n ■ 

■ D'n 

Cn 

Cn ••• 

Cm 

D'n 

At ■ 

■ • Dn 

Cik 

C’ th ••• 

cU 

D'n 

D' ht ■ 

•• A' h 


1 In general a chance quantity and the variable of its distribution function will be de¬ 
noted by the same symbol. 





452 


DAVID F. VOTAW, JR. 


it • 

• -i 

bb * 

< 

• • b 

as . 

nd •• 


tq^ • 

• • ^ 

it : 

• -i 

■e -t; 

QQ * 

• • b 

S S5 * 

q q • 

5 

* * Q 



. . tq 

it • 

•1 

-e -e 

bb - 

< 

* * b 

S 5 

C| Q • 

<5 

• • q 


^1q • 

• * b 

; 

j 

— 

• 

: : : 

it • 

•i 

w w 

bb • 

Cl 

* * b 

bb • 

M 

. . ^ 


e» w 

q q • 

• • c 

bl • 

4 

• i 

W Cl • 

QQ “ 

Cl 

b 

^ : 

. . CQ 


ic • 

Cl 

* * b 

bo * 

•i 

Cl Cl 

bb * 

cs 

* • b 

ff» (N 

^ OQ ■ 

. • ^ 


Cl Cl 

bb • 

M 

• • q 

'Ll) • 

•i 

bb • 


q q 

* * 1 


it • 

* ’ b 

bb * 

• t 

q ^ . 

• • OQ 

q q • 

* * 1 

. . . 

it • 

* * b 

it • 

•i 

b b * 

• • CQ 

bb • 

‘ * 1 


-e -e 

c c • 

' * b 

^ ^ * 

£ 

‘ • ^ 

3 3 • 

^ O 

* ‘I 

*3 S 

b b * 

N 

‘ "-b 


it • 

• -i 

2 S3 * 

2 

Cl Cl 

o o * 

• * 1 

S £ • 

o o 

S3 

* * o 


it - 

• -i 

2 5 

2 

* • ^ 

b b * 

• • L> 

N N 

bb • 

e» 

• • b 


it • 

• • b 


II 


Cl 

CO 















COMPOUND SYMMETRY 


453 


where C' t * = C $a Vn a , A' a = A a + (n a - 1 )B a , and D^ = A*.' \4i a « 0 , 
(s = 1, • • •, b;a, a' = 1, • • •, A; a ^ a'); C f0 , A aj B a , and are the cofac¬ 

tors of A"', C a , A a , B a , and Z>° 0 ', respectively in (3.2). The ellipsoid, defined by 
A lJ (X t — mi)(Xj — ra,) = r 0 (r 0 fixed and > 0), has (n« — 1) axes of equal 
length (a = 1, • • • , h ); and each of the remaining q axes is inclined to the co¬ 
ordinate axes so that its direction cosines have the same block symmetry as the 
set of diagonal elements in (3.2). 

When any of the R hypotheses is true, we represent || G tJ || by || A %3 1| (also 
[| G t j || by || Aij ||) which can be written as 

(3.4) || A 1 '|| = 


A 1 

e 

... J5 l 

c n 

B' 2 

... B 12 

c ik 

B' k ■ 

.. B lk 

E 

A' 

... B' 

B' 2 

C 2 

... B 12 

: B lk 

C lk ■ 

■ ■ B lk 

: & 

B' 

... a' 

D n 

B 12 

... C 12 

; B' h 

B lk • 

. . 

■ ■ C' k 

! 

E 1 

... B n 

A 2 

E 2 

... E 2 

l C 2k 

I~f k ■ 

■ ■ B 2k 

w 

C 2 ' 

... D n 

B 2 

A 2 

... E 

i B 2k 

C 2k ■ 

.. B 2k 

■ 

i . 

i w 

ff' 

... e i 

E 

E 2 

... a 2 

1 B 2h 

B lk • 

. . 1 
• ■ C 2k i 

■ 

1 

! •_ 

• 

■ \ • 1 

1 < 1 

C" 

B kl 

... E l 

c 2k 

B 2k 

... B 2k 

! A k 

E k ■ 

•• E | 

B kl 

C u 

... & l 

B A 

e h 

■ ■ ■ B 2h 

\ E 

A h • 

* i 

• 

• 

• 

• 




• 

. j 

B kl 

& x 

... c hl 

B 2h 

B 2k 

■ ■ ■ c 2k 

j e 

E ■ 

•• A k 1 


where the blocks arc formed by a partition ( n h ) of the t rows and t columns; thus 
each block is an n X n array. || A' 3 1| and || A i} || have the same block pattern 
of symmetry. 

A matrix having the block pattern of symmetry of (3.4) will be called block 
symmetric of type II. The determinant of || A t; || is 

| A« | = 


(3.5) 












454 


DAVID F. VOTAW, JR. 


where 

A\ — Bi Cn — JDi2 * Cih — Dih 

C 21 ~ D 21 A 2 — B 2 Cih — 15ik 

k = ; ; * ; 

Chi — Bhi Ch2 — Df ,2 • • • Ah — Bk t 
A[ Cn ... C[h 

C 21 Ai • . . C 2/1 

<3= : ■ . 

; tin Chi Ah , 

where = A a -f- (n — l)J5 a and C^ = C oa ' + (ft — 1 )/>««' («, a! — 
1, 2, • • * , h; a 7 * o'); -4 a , P 0 , £ aa ' , and /)«„,' are the cofactors of A a , 5®, , 

Z) 00 , respectively, in (3.4). 

4. Method of obtaining the sample criteria. The probability distribution, 
P, of a simple, random sample, say Os{X Xa , X 2a , • • • , A r ta )(« = 1,2, • • • , JV), 
from II is 

(4.1) P = x" W2 | Gij I™ exp [-£ <?<,(*(. - - m y )]. 

For 0* fixed, P is the likelihood function of the parameters mi, m 2 , • • • , 
and Gij (z, j — 1, 2, • • • , 0- To obtain sample criteria for testing the H and H 
hypotheses we shall employ the Neyman-Pearson likelihood-ratio method. The 
details of applying this method will be given for only one of the hypotheses, since 
the technique of application is the same for all the hypotheses under 
consideration. 

In applying the likelihood-ratio method we maximize P under two different 
sets of conditions and form the ratio of the two maxima. To derive a criterion 
for, say, Hi(mvc), we first maximize P over the set, 12, of admissible values of the 
parameters in (4.1); secondly, we maximize P over the set, w, of admissible values 
of the parameters in (4.1) that satisfy Hi(mvc). Let P D and P„ be these maxima, 
respectively. The likelihood-ratio criterion for Hi(mvc) is \i(mvc) = P u /Pq ; 
thus 0 < Ai(mt;c) < 1. The sample criterion, Li(mrc), for Hi(mvc) will be chosen 
as a single-valued function of Ai (mvc). 

4a. Derivation of the criterion Li(mvc). The parameter spaces, 12, and, «, can 
be specified as follows: * 

{ (1) || G^ || positive definite; 

(2) - co < nii < -f »(» = 1, 2, * * • , 0; 

[(1) || Aij || positive definite and block symmetric (of type I); 

OJ j 

[(2) — ao < rm < + 00 , (mi, m 2 , • • ■ , m t ) block symmetric. 



COMPOUND SYMMETRY 


455 


The block symmetries in w(l) and «(2) are for the same partition (l 6 , n*, • • • , n*) 
of the t variates (see sections 2 and 3). 

Maximizing P is equivalent to maximizing 

L - In P - -(OT/2) In x + (N/2) In | G %1 1 
- EO„(X,„ - m.)(X,. - m,). 

<.j,a 


Solving the simultaneous equations dL/dra, = 0(i = 1, • • • , t) and dL/dGij = 
0(i, j = 1, • • • ,t;i < j) for and G tj , we have 


(4.3) 


ih, = (l/N) E X,. = X,, 

a»l 

(N/2)(j’ = E (X,. - X,)(X,. - X,) = t>„; 


substituting these values of the parameters into (4.1) we find that 
(4.4) Pa = •n rK,II (2/N) Kli | v„ | exp [—iVf/2]. 

In (4.3) and (4.5) each expression at the extreme right is defined by the corre¬ 
sponding expressions at the left. 

In w(2) there are b + h groups of means, the means within a group being all 
equal; let m, be the 3-th mean and m Ta be the common value of the means in the 
(b + a)-th group. Solving the simultaneous equations dL/dm\ = 0, 
dL/dm'r a = 0, dL/dA... = 0, dL/dC, a = 0, dL/dA, = 0, dL/dB a = 0, 
dL/dD M ' = 0(s, s' = 1, • • • , 6; a, a' = 1, • • • , h; a j* a'), we find that 

(4 5) = 

< = (i/iv».)'Ex... = Z., 


(N/2)A"’ = E (X.„ - X.)(X.,. - X.-) = V„. , 

a—1 

(X/2)C*“ = (1/n.) E (X,. - X.)(X,„ - X'J = 

a » a 

(X/2)i a = (1/n.) E (X... - X r .) s = t-:, 

a »a 

(X/2)£° = [l/n«(n. - 1)] E (X... - *;.)(X,„. - K) = 

«•*«•Ja 

(N/2)b M> = (1/n. n.O E (X... - X'j(X, a ,. - X'j = 4', 

«.»o 

where i a , j« = b + n„ + 1, • • • , b + tf a +i ; ia 9* j a ; n a = fti + • • • + n«_i ; 
7h = 0; a, a' = 1, • • , h; a p* a'. 

When Hi(mvc) is true, the maximum likelihood estimates of nu, <n , and 
p i3 (i y j = 1, • • • , t) would be obtained by means of (4.5) and the definition of 
|| || given just after (3.1). 

Substituting the expressions in (4.5) into (4.1) we find that 
(4.6) P u - r~ mlt | vu |- y/ * (2/N) m exp (-Nt/2\, 

where 



456 


DAVID F. VOTAW, JR, 

















COMPOUND SYMMETRY 


457 


From (4.4) and (4.6) it follows that the likelihood-ratio criterion for H\(mvc) is: 

(4.8) Xi(mvc) = [ | V{, | / | v' %> | ] w/2) , (i,j = 1, • • , t). 

Finally, as the sample criterion for Hi(mvc ) we choose 

(4.9) Li(nwc) = lXi(mt>c)] (2/JV) = [ | v %J | / | v[, | ]. 

4b. Preliminary calculations for evaluation of moments of Li(mvc). The deter¬ 
minant | v %J | in (4.9) is block symmetric. From (3 2), (3.3), and (4.9) it follows 
that: 

(4.10) Li(mvc) = | v tJ | [il (v'a ~ w' a )~ ina ~ l) j | v' Tr . I -1 , 
where 

// 

Via' — Via' ! 

V ar a = U>aa\/nai 
Vr.r„ = Va + («a ~ l)WaJ 
I'r.r.' = -v/naWa'Za.', 

(«. s' = 1, • • , b; r, r’ = 1, • • • , b + h; r„ = 6 + a; a = 1, • • • , h). 

Let y,„ = - m t and Y, = (1 /N) E 7,., (i = 1, ■••,<)• Clearly 

a—1 
N 

v tJ * XI (F,a — F*)(F, a — Yj). When Hi(mvc) is true, u, a ,v ay w' a , and , 

a—1 

in Li(mvc), can be expressed exactly as they are expressed in (4.5) with F sub¬ 
stituted for X, and (v' a — w a ) and v" r > in (4.10) can be expressed as follows: 

v’a- w’a = (1 /n a ) {E^... - [I/O*. - 1)] E V..).} 

*a <a>*1a 

+ (N/n a ) E K - [N/n a (n a - 1)1 E ?,.! 

»o *a^J« 

H 

V§a* — Via't 

(4-H) = ( 1 /Vn,) 

*« 

^r«r a “ 23 ^t«j 0 > 

*a. Jo 

*o> Jo 

From (4.10) and (4.11) it follows that when Hi(nwc ) is true, each element of the 
determinants on which Li(mvc) depends consists of: (a) a quadratic form in F» 
and a linear function of the v tj ; or (b) merely a linear function of the 

Vi, (iy j = 1 ,■••,<)• 

The joint probability density function of the v,-, and is 
(4.12) f(v<M?i,---,? t ), 



458 


DAVID F. V0TAW, JR. 


where 



(II || positive definite; N > t) y which is the Wishart distribution [9, p. 120], and 

gift, ■••,?,) = | G„ | 112 N' 1 **-' 2 exp l-N £ G„ f, ?,] - g(?), say, 

* ] 

which is a normal J-variate distribution. The d- th moment (d = 0, 1, • * • ) 
of when Hi(mvc) is true, is 


E[Li(mvc)] d = f f(v„)g(?) | 


(ri <* ft) n 


where the domain, R, of integration is — « < Y % < + 00 || v tJ || positive semi- 
definite (i y j = 1, • • • , t). The integral in (4.13) is evaluated in section 6 (by 
means of Wilks’ moment-generating operators) for the case where H\{mvc) 
is true. 


6. Remarks on Wilks’ Moment-Generating Operators. Wilks’ operators 
are applicable to a far wider class of problems than those treated in this paper. 
The following discussion is confined to a special use of the operators. 

From (4.12) it follows that 

I V„ | w “‘" s)/s exp [- X) G„ t\,] n dv t , 

(5.1) [ -1——-*-*■*—- | ft, 

J *' n t[(N - i)/2] 

»-l 

where R f is the region in the space of v %J for which || v tJ || is positive definite, and 
|| G tJ || is positive semi-definite. (Of course, the probability that || v t} || is not 
positive definite is 0.) Let G[j — G %J + ft,(z, j = 1, • • • , t); if all the ft, are 
sufficiently small, || G[, || is positive definite, and we have 

, I v » r - 2 ' 2 exp [- Z G„V„] n dv„ 

|g^|(AT-l)/2 j ___ l _l __ 

(5.2) “ J *' T ,u - 1Vi f[ nor - 0/2] 

*-l 

= i (?., i r*- iw \ 

which is E(g) } where g = exp [—23 ftjt;,,]. 

i t,J 

Let /*, be an operator (whose operand is a function of all the ft,) which repre¬ 
sents the following set of operations: (a) replacement of each ft/ in the operand 



COMPOUND SYMMETRY 


459 


by B %J + f»f j ; (b) integration (of the result of (a)) with respect to £ t (i =* 1, • • • ,t) 
from — oo to + oo; (c) multiplication of the result of (a) and (b) by ir~ </2 . From 
(3.1) it follows that 

(5.3) l\j(g) = l\, (exp [- £ 0<,t/<,]) = g | v>, |" 1/2 , (|| v tJ || pos. def.); 

♦•j 

and if all the 0,-, are set equal to 0 after performing the /-operations, then g = 1 
and (5.3) yields | V{ } | ~ 1/2 . Let l\ } be X Repetitions (X = 1, 2, • • • ) of I ),. 
Clearly, 

(5.4) £[£,(</)] k,-o = E\g | v lt |- x/s ] k,_ 0 = E{\ v„ |- x/2 ]. 

Under all conditions of their use in this paper the I operations are interchangeable 
with the E operation [8; p. 316]; thus, 

E[l\,g] = 

From (5.2), (5.4), and [8, pp. 318-320] we have 

e [\ v„ p x/s ] = i <?., r i,/2 [/ x , i g '„ r <xr - i),2 i k,.„ 

(5.5) ‘ 

= I Go riltlN - i, -X], 


where N >< + X + 1 and i(R, S) = • 

The operator I l3 may be used, as indicated above, to find negative half-integer 
moments of | v t) |. To obtain positive half-integer moments of | v tJ | we may 
use an inverse operator 77? x [8, pp. 321-323] (X = 1, 2, • • • ) which has been 
defined in such a way that 




UN-1) 12 j 


{r :, X K 


-OV-D/2 


i k,.o = e[\ Vij n 


(5.6) 


= I G tl 


-A/2 


(n +[N-i, x]). 


The equality between the second and third expressions in (5.6) can be obtained 
from (5.1) by replacing N by N + X (see [7]). 

In (5.5) and (5.6) the /S’s are not necessary; however, in (4.13) and in similar 
expressions for the moments of the other criteria there are several determinants; 
each determinant requires a distinct /-operator, and it is of great convenience to 
introduce a distinct set of 0’s for each I operator. The 0’s associated with a 
given operator may initially appear in more than one of the determinants in the 
operand. The order in which several /-operators are used is illustrated in the 
following case for two: 

(5.7) [i), i g:, \- k vr; i g:; m k;_.i k,_». 

where X, p > 0 and the values of k' and h” are such that the value of the expres¬ 
sion is well defined. The notation in (5.7) means that 77/ is applied to | G?, I"*", 
the 0’s associated with 77/ are set equal to zero, then l) } is applied to the product 



460 


DAVID F. VOTAW, JR« 


of | 0[j 1“*’ and the results of the previous operations, and then the P’s associated 
with l)j are set equal to zero. The interchangeability of the order of I opera¬ 
tions is discussed in [ 8 , p. 324]. 


6 . The moments and distribution of Li (mvc) when Hi (nwc) is true. To 
evaluate (4.13) we let 

( 6 . 1 ) g = exp [- £ /3„ v„ - £ ~ w' a ) - ]£ 0 rr< »/«•']• 

x j a r r' 


From (4.11) and (4.12) we have 

(6.2) E(g) = I A„ r | a:, r™1 A", \- l, \ 


where 

A,,' = A,,' + ft*# + fit*' y 

A,t a = C, a + Ptt a + PtrJ^/nay 

^4*o»a $»o*a + Ai/tta + P” a rjn*i 

■4»alo “ “f" ^tala Pa/iV'a l)^o “f“ $r a r a /^a » (^a 7^ Ja), 

“4»aJa' ™ 7) a a' “1“ P tala' Pr a r a '/V^a Ha' > (u 5^ U ), 

A.'./ = , 

j" _ 

— wa , 

^4* a t a = Aa ~f“ Pa/^la , 

^4*aJa “ H a Pa/^aijla l)> ('la 7^ Ja), 

= Daa' , (a 7* a'). 


When Hi(mvc) is tnic, we have 

E[Umvc)) d = i a„ i Wn/r <n ‘- i) iAr / |(/“,(/r“iA:,r <w - i/i) ) 4 . / - # k;4, 

U" 1 J 0«-O 

(6.3) = jlI*(iV - *, M)}{]W + 2rf - r, - 2rf)| 

X {n^[W + 2d)(n a - 1), - 2d(n a - l)]}j]I(n a - l)‘ l<n ‘“ ,> }, 

(<i =0,1,2, ••• ;JV > «), 

where <7 = b + h and ^(R, S) is defined in (5.5). In (6.3) the assumption that 
Hi(mvc ) is true implies that after we apply 77, w and set the p XJ equal to 0 all 
remaining determinants are block symmetric; we may then use ( 3 . 3 ) before 



COMPOUND SYMMETRY 


461 


applying I and I 2 a d(na l> , (a = 1, • • • , h). The expression in (6.3) may be 
written as follows: 


E[Li(mvc)] d 



where n a is defined in (4.5) and {T) d = T(T + d)/Y(T). 

We now consider the problem of identifying from (6.4) the distribution of 
Li(mvc) (when Hi(mvc) is true). Let 0 be a beta variate, i.e., a variate whose 
c.d.f., F(0), is 

(6.5) F($) = I 6 (P, Q), (0 < 0 < 1; P, Q > 0), 


which is the Incomplete Beta Function ratio. Ie(P , Q ) is tabulated in [1] 
and [3]. The d-th moment of 0 is: 


( 6 . 6 ) 


F(0) d T{P + 91 

W Y(P) Y(P + Q + d) 


(P) d /(P + Q) d , 


(d « 0, 1, • * • ). Let 

(6.7) T = fie, (c = 1, 2, • • • )> 

,-l 

where the 6 } (j = 1, • • • , c) are mutually independent and each 6, is a beta 
variate, having parameters p } , q 3 , say. The d-th moment of r is 


( 6 . 8 ) 


E(r) d = II (?b V (jf>j + > 


7-1 


(d = 0, !,•••)• 


Given a variate, say (0 < /x < 1), whose d-th moment (d = 0,1, • • • ) is given 
by (6.8) we can infer by means of the solution of the Hausdorff problem of mo¬ 
ments that p and r have the same exact probability distribution function (see 
Corollary 1.1 [2, p. 11]). It should be noted that (6.4) can be written as 

E[Li(mvc)] d = II fi Kpa, a )d/(p 0« o + , 

o—l #o—l 


(6.9) 



462 


DAVID P. VOTAW, JR. 


where p^ =* [(N - q - s a - ft a + a - l)/2] > 0, 

„ J(8 .-l)j g + J. + ^- o + i 1 ^ a. 

" L57=1) + -2-J > °’ 

thus (6.4) is a special case of (6.8). 

The exact probability (density) function, say g(r), of r has been obtained by 
Wilks [7, p. 475] and is: 

g(r) = Kt p ‘-' (1 - T ) r ° _,o_1 J\..£ vr'vl'- 1 • • • ufa*" 1 

X (1 - (1 - • • • (1 — 

(6.10) X [1 - »i(l - r)l p ‘- ,s - 41 [1 - {»i + «i(l - Vi) } (1 - t)]”*-'*- 4 * • • • 

X [1 — {t»i + t>i(l — Wl) + • • • + Ve-l(l — fl)(l — Vt) • • • (1 — «.-»)} 

(1 - r) p, ~ l ~ p ‘~ ,e ] 


e-l 

X n dv,, 


i-1 


where K = 


tt r r (p/_+ ?/)*i 

M L r(p,)ffe) J, 


tt = 


£ 

7'-0 


(Pe-j' H” Qc-j'), 


7jj = 2J p c -j'. An approximation of the distribution of a product of inde- 
J'- o 

pendent beta variates by the distribution of a single beta variate is given in [4]. 

The results of this section may be summarized as follows: If Hi(mvc) is true , 
the d-th moment (d = 0, 1, • • • ) of the exact distribution of Li(mvc) is given by 
(6.4). Also, if Hi(mvc) is true, the exact distribution of L\(mvc) is given by 
(6.10), where the p ,, q ,, and c can be specified by means of (6.4). The cumula¬ 
tive distribution of L\{mvc) is given for certain special cases in section 7g. 


7. Single Sample Criteria. The solutions of problems (i) and (ii) (see section 
1 ) for Hiirnvc) are contained in (4.9) and the summary at the end of section 6. 
In the present section solutions of problems (i) and (ii) are given for each of the 
remaining two Hi hypotheses and the three Hi hypotheses (all of which are stated 
in section 2). For any of the hypotheses the sample criterion is chosen as a 
single-valued function of the likelihood-ratio criterion for the hypothesis. The 
methods of determining the moments and identifying the distribution of each 
sample criterion (when the corresponding null hypothesis is true) are entirely 
similar to those used in sections 4, 5, and 6 in regard to H\{moc ). Section 7g 
gives the exact distributions of the single-sample criteria for certain special 
compound symmetries. 

Each criterion discussed in this section is based on a sample 
0/t(Xia , Xia y * * * , Xta)(ot = N\ N > t) 



COMPOUND SYMMETRY 


463 • 


of size N from a normal t- variate distribution (t ® 3, 4, • • • ). As in the case of 
Hi(mvc) f it is presupposed for testing Hi(vc) or Hi(m) that there is a certain 
partition (l b , rii , n% , • • • , tin) of the t- variates; for testing Bi(mvc), R\(vc), 
or ffi(m ) it is presupposed that there is a certain partition (n h ) of the t variates 
(see sections 2 and 3). 


7a. The test Li(vc) for the hypothesis Hi(vc). For the sample criterion for 
H x (vc) we choose 

(7.1) Li(vc) - Mvc)] 2 * - | I / \v tJ |, (i,j =!,•••, 0 


where Xi(vc) is the likelihood-ratio criterion for Hi(vc), v tJ is defined in (4.3), and 


tw = t>„', 

“ (I/O v «7a > 

la 

K>. = (l/Wa) Z 

la 

Vtala = [1 Mn a *“ 1)] Z 
W* 

Vtala* == (1//Ia7la0 I 

(s, s' = 1, • • • , 6; a, a' = 1 , • • • , h\ a a'; i a , i' a , j a , j' a = b + n a + 1, 

6 -f- n a +i ; w a = m + * * • + Wo-i ; fti = 0 ). Since || v %J || is a block symmetric 
matrix, there is an expression for | v x , | that is entirely similar in form to the 
expression in (3.3) for | A tJ | (see also (4.9) and (4.10)). 

If Hi(vc) is true, 


(7.2) 


ElMY - jri - i, 2 d) 

[Um - 1 + 2d)(na ~ 1), -2d(n a - l)]j 
X ffr m - r + 2d, - 2 d]) {n (na - 

(r-l J U“1 



\( N ~< 

h n B —l 

n n 


1 (- 

o-l »o-l 


q — s a — n a + a — 


N ~ 1 , (Sg - 1 ) 
2 ’ t ’ (n. - 1 ) 


~3 


\ ] 


(d = 0, 1, ••■), 


where q = b + h and 4>(R, S), n. and (T) d are defined in (5.5), (4.5), and (6.4), 
respectively. From (7.2) and the argument given after ( 6 . 8 ) it follows that 
if Hi(vc) is true, the exact distribution of Li(vc) is given by (6.10), where the 
p,, q ,, and c can be specified by means of (7.2). 



464 


DAVID F. VOTAW, JR. 


76. The test Li(m) for the hypothesis For the sample criterion for 

Hi(m) we choose 

(7.3) L\ (m) = [Xi(ra)] 2/ = J - 7 ^- 1 , (i 9 j — t, ••• > 0, 

I V ij I 

where Xi(m) is the likelihood-ratio criterion for Hi(m) and v'a and Vi 3 are defined 
in (4.7) and (7.1), respectively. In passing we note that 

(7.4) [Li{m))[Li{vc)] - L^mvc). 

If Hi(m) is true, 

E[L 1 (m)] i = fl MOV - l)(»a - 1), 2a(n„ - 1)] 

0—1 


(7.5) 


X il{n a - 1){N + 2a), —2a(n a — 1)1 j 

(N_- 1 SaJ-A ' 

\ 2 ' n„ — IM 

[ \2 + n« - lj* J 


A fa—1 

= nn< 

a—1 s a —l 


id - 0 , 1 , ...). 


If Hi(m) is true, the exact distribution of L\(m) is*given by (6.10), where the 
p,, q } and c can be specified by means of (7.5). It follows from (7.5) that the 
exact distribution of Li(m), when Hi{m) is true, does not depend on b. 


7c. The test Li(mvc) for the hypothesis Hi(mvc). The sample criterion, Li(mvc), 
for Riimvc) (see section 2 ) is 

(7.6) Li(mvc) = [Xi (mvc)f y = | v<, | / ||, (i, j = 1 , • • • , t) 

where Xi(mvc) is the likelihood-ratio criterion for Ri{mvc), v%j is defined in (4.3), 
and 

= (1/a) Z (X„a - x' a f, 

a.Ja 

v,. ia = [l/n(n - 1)] Z (X, ia - r a )(X,'« - X' a ), (4 ^ j„), 

t a 

= (1 /n) Z, (X loa - X’ a )(X k ’, a - X'„.), 

<x•ia>ka , 

{k'a, = j a + n{a r — a); a 5 ^ a'), 

«;.».• = U/a(n - D] Z, 0 X„a - X')(X hi . a - K-), 

a.Ja^a' 

( h' a , 7 * j a + n(a — a'); a t* a'), 

(a = lj * * * , hj i a , Ja f ha , &a = (fl 1 )t2- | 1, • • • ^ 071 9 =s %a "4“ w(fl fl) j 
A 0 > 5^ ia + ft(a' — a); a = 1, • • * , N). || Vi 3 1| is a block symmetric matrix, 

of type II (see (3.4)), in which the blocks are formed by a partition ( n h ) (t = nh) 
of the rows and columns; there is an expression for | v i3 1 that is entirely similar 
in form to the expression in (3.5) for | An |. 



COMPOUND SYMMETRY 


466 . 


(7.7) 


If ffi(mvc ) is true, 

E[Lj(mvc)] d = (n - l) W(n_1) ( IT - i, 2d) 

U'-'i+i 

X + 2d){n - 1) + 1 - a, -2d(n - 1)] 

N - h - s - (w - l)(g - 1) ^ j 


[(* 

h n-1 I - 

nn v - 


0“1 *“1 


(*L + J ~ « +i^\ 

{ \2 ^ 2(n — 1) ‘ n — l/, 

(<*-o, i, •••). 

If Ri(mvc) is true, the exact distribution of L\(mvc) is given by (6.10), where 
thejj,, q 3 and c can be specified by means of (7.7). 


7 d. The test Li(vc) for the hypothesis Ri(vc). The sample criterion, Li(vc) for 
Ri(vc) (see section 2) is 

(7.8) I.(«) = [Xirf* = | | / | v (j | (i,j = 1, ... ,t), 

where Xi(vc) is the likelihood-ratio criterion for Ri(vc), Vij is defined in (4.3), and 

».V. = (1 /n) 23 

la 

v, ml . = [l/n(n - 1)] 23 , (i a 7* jo), 

»<.*.< = (1/n) 23 v, a t'., {k'a- = jo + n{a' - a); a ^ a'), 

la>ka * 

= [l/n(» - 1)] 23 v UK . , (C ^ j a + n(a’ - a); a 7 * a'), 

ia’ha ' 


where the ranges of a, i a , j a , h a , k a are given in (7.6). There is an expression 
forJ| | which is entirely similar in form to the expression in (3.5) for | Ay |. 
If Ri(vc) is true, 


E[Uvc)] d = (» - 1) MC ’ , - 1> rh^AT- i, 2d)] 

L»-fc+i J 

x(£ t[(N - 1 + 2 d)(n - 1) + 1 - a, -2d (» - 1)] 
(7.9) U_l 


fc n-1 

= nn 


— s — (n — l)(a — 1) 


_ ). 

(v^A+ 

\ 2 2(» - 1) n - \)i 


(N - h - s 


(d =0,1, •••). 


If Ri{vc) is true, the exact distribution of Zi (vc) is given by (6.10), where the 
Pi, q, and c can be specified by means of (7.9). 



466 


DAVID F. VOTAW, JR. 


7e. The test Ia(m) for the hypothesis ft x {m). The sample criterion Li(m), 
for Ri(m) (see section 2) is 

<7.io) 

where Xi(wt) is the likelihood-ratio criterion for R x (m) and || vq || and || vq || are 
given in (7.8) and (7.6), respectively. _ 

If Ri(m) is true, the d-th moment (d = 0,1, • • •) of Li(m) is 

E[L 1 (rn)] i -nn 

(7.11) “- 1 - 1 

(d = 0,1, •••)• 


(*Lzl + J—JL + tul) 

\ 2_2 (n — 1) n — 1/d 


(K + J + 1ZL 1) 

\2 2(n - 1) n - 1 L 


If Si(m) is true, the exact distribution of Li(m) is given by (6.10) where the 
Pi , qj and c can be specified by means of (7.11). 


7 f. Relations among Li(mvc), L x (vc) } and L x (m) and among Li(mvc), L\(vc), 
and Li(m). Li(mvc) is the product of Li(vc) and Li(m) (see (7.4)); moreover, 
when Hi(mvc ) is true, the d-th moment (d = 0, 1, • • • ) of Li{mvc) equals the 
product of the d-th moments of Li(vc) and Li(ra) (see (6.4), (7.2), and (7.5)). 
From this result and the argument given after (6.8) it follows that when Hi(mvc) 
is true, Li(mvc) is the product of two independent chance quantities, namely, 
Li(vc) and Li(m). Similarly, when Ri(mvc) is true, Li(mvc) is the product of 
two independent chance quantities, namely, Li(vc) and L x (m). 


7g. Exact distributions of single sample criteria in special cases. For a sample 
of size N and a partition (l b , n \, • • • , ft*) of the t variates of II (see section 2) 
let the cumulative distribution function (c.d.f.) of Li(mvc), when H x (mvc) is 
true, be 

(7.12) F(u | l b , ni, • • - , n h | N) = Prob {himvc) < u \; 

also, let F(y | l 6 , n x , • • • , n k | N) and F(z | l 6 , n x , • • • , n h | A^) be the c.d.f.^ 
of L x (vc) and L x (m) when H x {vc) and H x (m) are true, respectively. Let 
F(u | n h | N), F(y | n h | N) y and F(z | n h | N) be the c.d.f/s of I x {mvc) y Z x (vc), 
and Ii(m) when R x (mvc), R x (vc ) and R x (m) are true, respectively. 

It can be shown that 

F(u 11», 2 | N) = /„ [(N - b - 2)/2, (b + 2)/2], 

F(«| 1*. 3 | N) = I V -[N - b - 3, b + 3], 

F(V 1 1‘, 2 | N) - I t [Of - b - 2)/2, (b + l)/2], 



COMPOUND SYMMETRY 


467 . 


(718) F(y\l b t S\N) = I^W-b -3,6 + 2], 

' F(z | l 6 , n | N) - I t , [(N - l)(n - l)/2, (n - l)/2], U' = 

F(tZ|2 2 |AT) - I Vz [A- 4,3], 

F(2/|2 2 |A0 = / v * [AT - 4, 2], 

F(* | n 2 1 AT) « /*, [(N - l)(n - 1) - 1, n - 1], [*' = z 1 ' 2 **- 1 *], 

where I t (P, Q) is defined in (6.5). 

Distributions of the criteria in certain cases where the normal distribution is 
completely symmetric (see section 2) are given in [5]. 

7h. Asymptotic distributions of the single sample criteria. When the sample 
size, N , is large, we may use a theorem [6] (see also [9, pp. 151-2]) concerning 
the approximate distribution of the likelihood-ratio criterion. For large N the 
distributions of the quantities — N In Li(mvc) } — N In Li(vc), and —N In Li(m) 
(when H\(mvc ), Hi(vc), and respectively, are true) are approximately 

chi-square distributions with (1/2) [t(t + 3) — 6(6 + 3) — h(h + 5)] — hb, 
(1/2 )[t(t + 1) — 6(6 + 1) — h(h + 3)] — hb, and t — b — h degrees of free¬ 
dom, respectively. Also,_ for large N the ^distributions of the quantities 
—AT In Li(mvc), —AT In Li(vc ), and —AT In Li(m) (when Ri(mvc), ffi(vc) } and 
respectively, are true) are approximately chi-square distributions with 
[t(t + 3)/2 — h{h + 2)], [t(t + l)/2 — h(h + 1)], and t — h degrees of freedom, 
respectively. 

8. ^-Sample Criteria. In this section solutions of problems (i) and (ii) (see 
section 1) are given for the three H k and the three ftk hypotheses (all stated 
in section 2). 

A test of any of these hypotheses is based on k simple, random samples (k > 2) 
from k compound-symmetric, normal ^-variate distributions. The probability 
density function, Q , of the k samples, say, 0 Np (p = 1, • • • , k\ N p > b + h) is 

(8.D Q = ^ ,,/s [ni(?.7.pi vi ] 

X exp [ 2!) G%j,p(Xia 9 mi,p)(X^ 

i,j,p.ap 

k 

(N f = XN p -,i,j = 1, • • • , t), where Xiap is the a p -th sample value of the 

p-i 

t-th variate in the p-th population (a p ~ 1, • • • , N p ), m», p is the mean (expected 
value) of the t-th variate in the p-th population, and (1/2) || Ga, p || _1 is the 
variance-covariance matrix of the variates in the p-th population (see (3.1)). 
For a given set of k samples Q is the likelihood function of the parameters 
Gij tP and m itP (», j - 1, • • • , t; p -» 1, • • • , k). 



46 $ 


DAVID P. V0TAW, JR. 


The six hypotheses under consideration (see section 2) can be restated in terms 
of G %J ,p and ra;, p ; e.g., H k (MVC | mvc) asserts that m t ,i * m,, s = • • • = m,,* 
and || || = || 1| — ••• — || G t j t k |) given that for all p the vector 

(mi , p , • • • , m t , p ) is block symmetric and the matrix || G %)tV || is block symmetric 
(of type I) for a preassigned partition (l 6 , n \, • • • , tin) of the t variates (see 
sections 2 and 3). 


8a. Expressions for the criteria . Let \ k (MVC | mvc), • , X*(M | mvc) repre¬ 
sent the likelihood-ratio criteria for the six hypotheses H k (MVC | mvc), • • • , 
B k (M | mVC) respectively, and let L k (MVC | mvc), • • • , L k (M | mVC) be the 
sample criteria for the respective hypotheses. We choose the L k as follows: 

L k (MVC | mvc) « [\ k (MVC | mvc )} 2 , 


L k (VC j mvc) = [\ k (MC | mvc)] 2 , 

(8.2) L k (M | mVC) = [\ k (M | mVC )] 2IN ', 

_ fL k (MVC | mvc)\ l,!% ' 

\ L k (VC | mvc) / ; 

the expressions for L k (MVC | mvc), L k (VC | mvc), and L k (M | mVC) are the same 
as those in (8.2) with \ k replaced by \ k . The A* and A* can be obtained explicitly 
by straightforward application of the likelihood-ratio method (see the paragraph 
preceding section 4a). 


86. Moments of the k-sample criteria . The exact distribution of any of the 
fc-sample criteria, when the corresponding null hypothesis is true, is given in 
(6.10), where the quantities p,, q ,, and c can be specified by means of the moment 
expressions given below. The moments have been obtained by means of the 
operators discussed in Section 5. 

For each of the following six moment expressions the null hypothesis, cor¬ 
responding to the sample criterion involved, is assumed to be true: 


E[Lk(MVC | mvc)] d 

( 8 . 3 ) 

E[L k (VC | mvc)] d 


J 4- fa* Z 

2 N p ^ N p 


l 

• if ). . 


43J.G 
SS, G - » 

| sfiftG-^.+ X ), 



COMPOUND SYMMETRY 


460 


E[L„(M | mVOY = IT 


E[L k (MVC I mvc)] d 


fnn'tfYU 

v> p-ia-i u'p-i \2 N p (n a — 1 )/<*(. 

X (il-i)V ’ 

l M Mi \2 ^ jV'(n„ - 1)/d J 
’ (N' - k + 1 - r\ ' 

= frV_ 2 M . 

-i m r 

rrfif 1 - a + 

l M Mi \2 2AT ^ N’ )< J 

tt TT TT ( 1 -4- 1 ~ a + \ 

v J MM \2 T 2N v (n - 1) ^ N p (n - \)) d I 

n"ff I Y- 1 +- 1_ ^-+’ 

l M «M t \2 ^ 2 N’(n - 1) ^ N’(n - \))i J 

f nnftf 1 - a i ( ^ ~ 1) N > 1 

_ I p-i i-i «,-i \2 2N P _ _ N p )d l 


fr TT ft ^ - a 4- ~ 

ElL t (VC | mvc))' = T—f 

TT TT M _ (# +J* ■ 1) I (]< — 1 

M Ml \2 2N’ AT' 


E[L t (M | mVC)\‘ = II 

a-1 


sSG- ? »“ ,)+< V)J 

UllX + ^--) + srl>~%l r 


m 


where d = 0, 1, • • • and ( T)d is defined in (6.4). 


8c. Comments on the criteria. By an argument similar to that used in section 7f 
it follows from (8.3) that when H k (MVC | mvc ) is true L k (MVC | mvc ) is 
the product of two independently distributed chance quantities, namely, 
Lk(VC | mvc) and [L k (M | mVC)] N \ The same assertion holds true if we re¬ 
place each L by Z and H by R. 

Pxact distributions of the ^-sample criteria, when the corresponding null 
hypotheses are true, can be obtained explicitly for special values of k and special 
compound symmetries; but owing to lack of space we shall not consider them 
in this paper. 



470 


DAVID F. VOTAW, JR. 


When the sample size N' is large, the exact distributions of 

- lnL h (MVC\mvc ), -In L k (VC | mvc), —N' In L k (M | mVC), 

-In Zk(MVC | mvc ), -In Z k (VC | mvc ), 

and — N' In Z k (M | mVC) (if the corresponding null hypotheses, respectively, 
are true) are approximately chi-square distributions with 

(t - 1)16(6 + l)/2 + kb + »(» + 3)/2|, 

q(k — 1), h(h + 2)(k — 1), h(h + 1 )(k — 1), and h(k — 1) degrees of freedom, 
respectively. 

9. Illustrative examples. The first of the following two examples 2 illustrates 
the use of Li(mvc), L x (vc), and Li(ra) in a psychometrics experiment; the second 
example illustrates the use of Z x (mvc ), Zi(vc), and L x {m) in a medical-research 
experiment (see section 1). 

Example 1. In an experiment to establish methods of obtaining reader 
reliability in regard to essay scoring, 126 examinees were given a three-part 
English Composition examination. Each part required that the examinee write 
an essay, and for each examinee four scores were obtained on the following four 
things, respectively: (1) the part-2 and part-3 essays together, (2) the original 
part-1 essay, (3) a long-hand copy of the part-1 essay, (4) a carbon copy of the 
long-hand copy in (3). Scores were assigned by a group of “English Readers” 
using procedures designed to counterbalance certain experimental conditions. 
The score on (1) serves as a criterion. The experimenter asks whether on the 
basis of the sample (of size 126) the quantities associated with (2), (3), and (4) 
can be considered as interchangeable among themselves and interchangeable 
with respect to their relation to the criterion (1). 

Let X \, X 2 , Xi , and X 4 be the scores on (1), (2), (3), and (4), respectively. 
It is assumed that (X x , X 2 , X s , X 4 ) has a normal 4-variate distribution and 
that the set of scores ( X x « , X 2a , X ia , X 4a ) (a = 1, • • • , 126) obtained from 
the essays is a random sample of values of (X x , X 2 , X 3 , X 4 ). The following 
three questions will be considered (see section 2), where the grouping of the four 
variates is (1, 3): (a) Is the sample consistent with the hypothesis Hi(mvc)? 
(b) Is the sample consistent with the hypothesis H\(vc)? (c) Is the sample 
consistent with the hypothesis In the particular experiment under 

discussion (a) is the experimenter’s question. 

* Mr. L. R. Tucker (Educational Testing Service, Princeton, New Jersey) and Captain 
J. Allan Rafferty, M.D. (Air University School of Aviation Medicine, Randolph Field, 
Texas) kindly gave the author the data for Examples 1 and 2, respectively. 



COMPOUND SYMMETRY 


471 


The sample means and variance-covariance matrix are as follows: 



Xi 

X, 

X 3 

x< 


77.8976 

20.9425 

23.4544 

18.0384 



20.9425 

25.0704 

12.4363 

11.7257 



23.4544 

12.4363 

28.2021 

9.2281 



18.0384 

11.7257 

9.2281 

22.7390 


Means 

28.0556 

14.9048 

15.4841 

14.4444 

This matrix is (1/126) || vij || 

(h 3 = 1, 

• • • , 4) (see (4.3)). 

The sample criteria 


Li(mvc), Li(vc ), and Li(m) will be used to answer questions (a), (b), and (c), 
respectively. The values of the criteria can be computed from the values of 
I Da I , | v'ij | , and | | (see (4.9), (7.1), (7.3)), where v\j is given in (4.7) and 

Vij is given below (7.1). The Vij(i ^ 1 ^ j) are evaluated by simple averaging 
of certain elements in || va || . Both | v'a | and | Vij | have the block pattern 
of (3.2) and can be expressed in the simplified form of (3.3), where h = 1 and 
fti = 3; the simplified form of | v\j | can also be obtained from (4.10) and (4.11). 
From the data above it is found that 

Li(mvc) = | | / | Vij | = .9214, 

Li(vc) = | v^ | / | Vij | = .9568, 

LiW = | Vij | / | va | - .9630. 

The second, fourth, and fifth formulas in (7.13) (for N = 126, b = 1, n « 3) 

give the distributions of Li(mvc ), Li(vc), and Li(m), respectively (when the 
hypothesis with which the criterion is associated is true). By direct computa¬ 
tion with expressions for the Incomplete Beta Function ratios the per cent points 
corresponding to the observed values of Li(mvc), Li(vc ), and Li(m) are found 
to be .26, .49, and .09, respectively. Thus at the 5% significance level the 
answer to any given one of the three questions (a), (b), (c) is yes. Critical 
values of Li(mvc), Li(vc), and Li(m) for various significance levels can be ob¬ 
tained from [3] by interpolation. 

Example 2. In an experiment to study certain properties of the blood of 
asphyxiated dogs, the %C 0 2 and hematocrit of 10 asphyxiated dogs were meas¬ 
ured four minutes and seven minutes after asphyxiation. Let X\ and Xi be 
%C0 2 and hematocrit four minutes after asphyxiation, respectively, and X 2 
and X 4 be %C 02 and hematocrit seven minutes after asphyxiation, respectively. 
It is assumed that (Xi, X 2 , X z , X A ) has a normal 4-variate distribution and 
that the set of measurements ( X Xa , X 2 * , X Za , X ia ) (a = 1 , • • • , 10) obtained 
from the 10 dogs is a random sample of values of (Xi , X 2 , X 3 , Xi). The fol¬ 
lowing questions will be considered, where the grouping is (2*): (a) Is the sample 
consistent with the hypothesis Bi(mvc)? (b) Is the sample consistent with the 
hypothesis Ri(vc)7 (c) Is the sample consistent with the hypothesis Bi(m)? 
In the particular experiment under discussion (a) is the experimenter’s question. 



472 


DAVID F. VOTAW, JR. 


The sample means and sums of squares and cross-products are as follows: 


x l 

x t 

X , 

X ., 


294.916 

313.908 

-89.364 

-69.282 


313.908 

363.689 

-130.422 

-69.261 


-89.364 

-130.422 

210.350 

241.688 


-69.282 

-69.261 

241.688 

515.789 

Means 

50.780 

53.590 

41.180 

43.890. 


This matrix is || || (i, j = 1, • • • , 4) (see (4.3)). The sample criteria Zi(mrc), 

Li(rc), and Z x (m) will be used to answer questions (a), (b), and (c), respectively. 
The values of these criteria can be computed from the data above (see (7.6), 
(7.8), and (7.10)) and are found to be: 

Liirrwc) = | | / | v'i, | = .09107, 

L\(vc) = | Vij | / | vh | « .3259, 

Ii(m) - | hi | / | Vi, | - .2794. 

The sixth, seventh, and eighth formulas in (7.13) (for N = 10, n = 2) give the 

distributions of L x (mvc ), L x (vc), and Li(m), respectively (when the hypothesis 
with which the criterion is associated is true). From [1] it is found that the 
observed values of Li(mvc), L x (vc), and Li(m) correspond to the 1.2, 12.4, and 
.6 per cent points, respectively, of the distributions referred to above. Thus 
at the 5% significance level the answer_to question^ (a) and (c) is no and to (b) 
is yes. The critical values of Li(ravc), Li(rc), and Li(m) for various significance 
levels can be found from [3]. 

More than one of the sample criteria may be of interest in regard to a given 
sample (see [5] pp. 267-268). For example, in an experiment such as that 
described in Example 1 suppose the answer to question (a) is no. The experi¬ 
menter might then consider question (b); if the answer is no, the inconsistency 
of the sample with Hi(mvc) might be regarded as due to the variances or co- 
variances. If the answer to (b) is yes, the experimenter might then consider (c); 
if the answer here is no, the inconsistency of the sample with Hi(mvc) might be 
regarded as due to the means. If, however, the answer here is yes, further study 
might be required to “explain” the inconsistency. 

10. Acknowledgement. The author wishes to express his gratitude to Pro¬ 
fessor S. S. Wilks under whose direction this paper was written. The author 
also wishes to thank Professor J. W. Tukey for many valuable conversations on 
certain mathematical aspects of the problems and to thank Dr. W. E. Kappauf, 
J. Allan Rafferty, M.D., and Mr. L. R. Tucker for assistance on applications 
of the test criteria. 



COMPOUND SYMMETRY 


473 


REFERENCES 

[1] K. Pearson, Tables of the Incomplete Beta Function , Cambridge University Press, 1932. 

[2] J. A. Shohat and J. D. Tamarkin, The Problem of Motnents , American Mathematical 

Society, 1943, pp. 9-12. 

[3] Catherine M. Thompson, “Table of percentage points of the incomplete beta func¬ 

tion, M Biomctrika , Vol. 32, Part II (1941), pp. 151-181. 

[4] John W. Tukey and S. S. Wilks, “Approximation of the distribution of the product of 

beta variables by a single beta variable,” Annals of Math. Stat., Vol. 17 (1946), 
pp. 318-324. 

[5] S. S. Wilks, “Sample criteria for testing equality of means, equality of variances, 

and equality of covariances in a normal multivariate distribution,” Annals of 
Math. Stat., Vol. 17 (1946), pp. 257-281. 

[6] S. S. Wilks, “The large-sample distribution of the likelihood ratio for testing composite 

hypotheses,” Annals of Math. Stat., Vol. 9 (1938), pp. 60-62. 

[7] S. S. Wilks, “Certain generalizations in the analysis of variance,” Biometrika , Vol. 

24 (1932), pp. 471-494. 

[8] S. S. Wilks, “Moment-generating operators for determinants of product moments in 

samples from a normal system,” Annals of Math., Vol. 35 (1934), pp. 312-340. 

[9] S. S. Wilks, Mathematical Statistics , Princeton University Press, 1943. 



BRANCHING PROCESSES 1 

By T. E. Harris 

Project RAND , Douglas Aircraft Company 

1. Summary. This paper is concerned with a simple mathematical model 
for a branching stochastic process. Using the language of family trees we may 
illustrate the process as follows. The probability that a man has exactly r 
sons is p r , r = 0, 1, 2, • • • . Each of his sons (who together make up the first 
generation) has the same probabilities of having a given number of sons of his 
own; the second generation have again the same probabilities, and so on. Let 
z n be the number of individuals in the nth generation. We study the probability 
distribution of z n . Some previous results are given in section 2; these include 
procedures for computing moments of z„ , and a criterion for when the family 
has probability 1 of dying out. In sections 3 and 4 the case is considered where 
the family has a non-zero chance of surviving indefinitely. In this case the 
random variables z n /Ez n converge in probability to a random variable w with 
cumulative distribution G{u). It is shown that G(u) is absolutely continuous 
for u 0. Results of a Tauberian character are given for the behavior of G(u) 
as u —* 0 and u —► <*>. In section 5 some examples are given where G(u) can 
be found explicitly; G(u) is computed numerically for the case pi = 0.4, p 2 = 0.6. 
In section 6 families with probability 1 of extinction are considered. A method 
is given for obtaining in certain cases an expansion for the moment-generating 
function of the number of generations before extinction occurs. In section 7 
maximum likelihood estimates are obtained for the p r and for the expecta¬ 
tion Ezi ; consistency in a certain sense is proved. In section 8 a brief discussion 
is given of the relation between two types of mathematical models for branching 
processes. 

2. Introduction. By a branching stochastic process is meant a phenomenon 
of the following general type: each of an initial aggregate of objects can give rise 
to more objects of the same or different types, the objects produced can then 
produce more, and the system develops, subject to certain probability laws. 
Examples are the development of human or animal populations, propagation of 
genes, and nuclear chain reactions. The mathematical model dealt with in this 
paper may be thought of as representing the generation-by-generation growth 
of a family, the fundamental random variable being the number of individuals 
in the nth generation. Under certain conditions, however, this model may 
describe the size of a family at a sequence of points in time. This question will 
be touched on in section 8. 

1 Based on a doctoral dissertation presented to the Mathematics Department, Princeton 
University, June, 1947. 


474 



BRANCHING PROCESSES 


475 


Definition 2.1. The random variables z» , n = 0,1, 2, •• • , will be said to 
represent a simple discrete branching process provided: zo = 1; P(zi = r) = p T , 

oe 

r = 0, 1, 2, • • • , with ^2p r — 1; the conditional distribution of z« + i, given 

r»0 

z» = r, is that of the sum of r independent random variables, each having the 
same distribution as Z\. 

oe 

Assumptions. Throughout this paper we assume that r p T < oo, that at 

r—0 

least two of the p r are positive, and that po + Pi < 1. 

Definitions 2.2. Let x — Ez\ — 2rp r , <r 2 = Var (z0 = Sr 2 p r — x 2 . Let 

•e 

/(«) = 2M r be the generating function of Zi (s denotes a complex variable). 

r-0 

QQ 

Let pnr = P(z n = r) and/ n (s) = 2) p nr s r ; of course pi f = p r and/o(«) = $. The 

r-0 , 

assumptions given above insure that the first and second derivatives f(s) and 
/"($) are continuous in the set consisting of the interior of the unit circle and the 
point 8 = 1; thus derivative notations such as/"(l) are used even though/(s) 
may not be analytic at s = 1. It will be seen shortly that a similar remark 
applies to the functions /»($) and certain functions to be introduced later. 

In the remainder of this section we shall summarize certain results; most of 
them are contained implicitly or explicitly in works by Fisher [1], Lotka [2], 
Steffensen [3], Ulam and Hawkins [4], Kolmogoroff [5], Kolmogoroff and Dmitriev 
[6], and Yaglom [7]; some of these references are not widely available. 

From our definition, P(z n +i = k I — j) is the coefficient of s k in [/(«)]'. 

00 

Hence p n +i.k is the coefficient of s k in 2 Pn;Lf(s)] y , whence 

;-o 

(2.1) /.+l(«) - f n \f(8)]- 

Letting n = 1, 2, • • • , successively, it follows that the generating function of z n 
is the nth functional iterate of f(s) . Hence 

(2.2) f n+l (s) = flf n (s)). 

We note that/n(l) = Ez n ,fn(l) +/n( 1) — [/»(1)] 2 = Var(z n ). Differentiation 
of (2.1) at s = 1 gives/n+i(l) = z n+1 ; another differentiation gives /n+i(l) = 
f"(X)VZ(i)f + /'(l)/n(l) while twofold differentiation of (2.2) gives /n+i(l) = 
/"(l)fn(l) + [/'(l)] 2 /n(l); these two expressions for / «+i(l) can be equated and 
solved for/”(l), provided x = /'(l) 7 ^ 1. Thus the mean and variance of z» are 

given by Ez n = ( Ezi) n = x n \ Var (z n ) - —— ^ » x 1; Var (z n ) = no- 2 , 

x = 1. Higher moments, if they exist, may be found by a similar process. 

Definition 2.3. Denote by a the smallest non-negative real root of the 
equation t = f(t). We see that x < 1 implies a = 1 while x > 1 implies 
0 < o < 1, the equality a » 0 holding if and only if p 0 = 0. In no case can the 
half-open interval 0 < t < 1 contain more than one root. It is readily seen that 

(2.3) limpno = lim/*(0) = a. 



476 


T. E. HARRIS 


We thus have the well known result: the number a is the probability of eventual 
extinction of the family. The relation between a and x shows that the probability 
of extinction is 1 if and only if x < 1. 

It is also clear that 0 < t < 1 implies lim/ n (0 = o; this, together with (2.3), 
shows that 

(2.4) lim p nr = 0, r = 1, 2, • • • . 

n —»ao 

Relation (2.4) means roughly that the family either dies out or gets very large. 
In section 4 it will be shown that (2.4) holds uniformly in r. 

Definition 2.4. The random variables w n are defined by w n = z n /x n . 

Clearly Ew n = 1 and Ew\ = 1 + ~ 2 ~— ( 1 — n ) if x ^ 1. 

x x \ x / 

Suppose n > m. Then E(z n z m ) = PmrE(rz n | z m = r) = P«rrV~ m = 

r r 

x n ~ m E^ n . Thus E(w n w m ) = Ewm , whence 

(2.5) E(w n — 'Wm) 2 = Ewl — EwL , U > W. 

By virtue of (2.5) we obtain 

Theorem 2.1. If x > 1, random variables w n converge in mean square , 
/tence in probability , to a random variable w. 

For in this case Ifo? 2 —> 1 + 2 “— as n —> «> and (2.5) shows that 

x — £ 

2£(u>n — w«) 2 —* 0 as n and m —» <». Theorem 2.1 is then a consequence of [8], 
p. 38, I. 

It is well known that convergence in mean square implies Ew\ —► Ew 2 and 
E(w n — l) 2 —> E(w — l) 2 whence Ew n —* Ew, 

Thus we have 

2 

(2.6) Ew = 1, Ew* = 1 + - 0 — - . 

In order to study the behavior of z n for large n when x > 1, we consider the 
distribution of w. 

Definitions 2.5. G n (u) = P(w n < u)\<f> n (s) = E(e Wn ') = [ e u dG n (u). 

Jo- 

Definitions 2.6. (Applicable when x > 1.) G(u) = P{w < u)\ 4>(s) = 

E(e w ') = 1 e u dG{u). We shall refer to G{u) as the asymptotic distribution 
Jo- 

branching from f(s). 

The moment-generating functions (m.g.f.’s) <f> n (s) and <t>(s) are defined at least 
for Re (s) < 0. Unless specifically stated otherwise we shall consider them only« 
in that domain. 

From (2.2) and the fact that <t> n (s) = fn[e' ,xn ] it follows that <t> n +i(sx) = /[</>» (s)]. 
Theorem 2.1 implies that if x > 1 G n (u) —> G(u) and <j> n (s) —> <t>(s) for Re ($) < 0. 
Thus the m.g.f, <f>(s) satisfies the functional equation 

(2.7) <t>(sx) - /[*(«)], Re (s) < 0. 



BRANCHING PROCESSES 


477 


Equation (2,7), which of course is applicable only when x > 1, was obtained in a 
different form by Ulam and Hawkins. It belongs to a type usually known as 
Koenigs’ equation, after the nineteenth century mathematician who studied it 
in connection with functional iteration, and is related to an equation studied by 
Abel. We shall make some use of the work of Koenigs later. See Hadamard [9] 
and Koenigs [10]. 

We note that Ew k < <» if and only if Ez\ < co . It was already pointed out 
that Ew =1. As pointed out in [4], as many further moments of w as exist 
may be found by successive differentiation of (2.7) at s = 0. 

Finally we note that G„(0) = p n o. Hence lim G„(0) = a. Thus G(0) = 
P(w = 0) > a. We show later that (7(0) = a. Clearly G(u) = 0 for u < 0. 

In sections 3 and 4 we always assume x > 1. 


3. Asymptotic properties of the moment-generating function. We first 
show that (2.7) uniquely determines the distribution of w. Specifically, 
Theorem 3.1. Let Gi(u) and G 2 (u) be distributions with equal first moments 
and finite second moments whose characteristic functions fa{it) and fa(it) satisfy 
(t is real) fa(itx) = /[0 r (i’O], r = 1, 2. Then G\(u) — G 2 (u). 

From [13], p. 27, fa(it) — fa(it) = t 2 fat), where fat) is bounded as t —> 0. 


From (2.7), | fa(itx) - fa(itx) | = \f[fa(it)] - f[fa(it)] | < x \fa(it) - fa{it) |, 
since | f'(s) | < x when | s | < 1. Hence for t 0, | > x | 0(f) |. Thus 


P(t) cannot be bounded near t = 0 unless it is identically zero; hence 


fa (it) = fa (it). 

It is clear that the requirement that fas) have the form 1 + 5 + 0(s 2 ) between 
two rays from the origin is sufficient for the uniqueness in that domain of solu¬ 
tions of (2.7). On the other hand, continuous solutions can be constructed at 
will if the existence of a derivative near s = 0 is not required. 

Before proceeding further, it is convenient to define three functions fas), 
fas), and H(u) which are closely related to/(s), fas), and G(u) respectively. We 
repeat that we are considering only the case x > 1. See definition 2.3 for a. 

Definitions 3.1. Let fas) = ~ . Clearly fas) is a proba¬ 

bility generating function with k( 0) = 0, k'( 1) = /'(l) = x , /c"( 1) < «>. We 

CO 

write fas) = 23 QrSr. We also define the iterates k n (s) by 


/vq(s) — S, ^n+l(5) — &[&n(5)]. 


Definitions 3.2. Let H(u) be the asymptotic distribution branching from fas) 
(See Definition 2.6.) Let ^(s) be the corresponding moment-generating func¬ 
tion. We know then that fas) and fas) satisfy 


fasx) = k[fas)]. 


( 3 . 1 ) 



478 


T. E. HARRIS 


In view of the uniqueness theorem we have, by direct substitution in (3.1), that 
$(s) must be given by 

(3.2) m - * 1(1 r o) j ~ a , 

1 — a 

and that H(u) must be given by 

(3.3) B(u) = V , ' u > 0 ; H(u) = 0 , u < 0 . 

1 — a 

We shall see later that H( 0 ) = 0; i.e., that G( 0 ) = a. Therefore H(u ) is the 
conditional distribution of (1 — a)w, given that w 9 * 0. Another way of stating 
this is as follows: 

Theorem 3.2. The random variable w is distributed as the product of two inde¬ 
pendent random variables w 0 -w' f where w 0 takes the values 0 and with prob¬ 

abilities a and 1 — a respectively while w' has the asymptotic distribution branching 
from k{s ). 

For it is directly verifiable that ^(s) is the m.g.f. of w 0 -w'. 

In theorems 3.3 and 3.4 we consider the behavior of ^(s) for large | s |. To 
make for smoother reading we defer the proofs till section 9, where somewhat 
more general formulations are given. In section 4 the properties of are 
interpreted in terms of G(u ). 

Definition 3.3. Let 7 = log, = log, [j*/^ J • (See definitions 2.3 and 

3.1.) If qi = 0 (i.e., p 0 = pi = 0 ) we take 7 = 00 . 

Theorem 3.3. Suppose 7 < «. Then if Re (s) < 0 and s^O, 

(3.4) ^ + MM. 


M(s) is continuous for s ^ 0; M(s) and M 0 (s) satisfy respectively 

(3.5) M(sx) = M(s); M 0 (s) = 0 , \s | -► «. 

Remarks. (See section 9 for proof.) (a). Under the conditions of the theorem 
M{s) is real and positive when s is real and negative, (b) If Ez[ < 00 and the 
conditions of the theorem hold, the rth derivative of \p(s) satisfies 

(3.6) | t M (s) | = 0 . I» I -* * • 

(c) If 7 ~ <», ^(s) and as many derivatives as exist approach 0 exponentially 
as | s | —» 00 . 

We now consider the behavior of ^(s) on the positive real axis, provided it is 



BRANCHING PROCESSES 


479 


Lemma 3.1. Letf(s) be analytic in the circle | s | < a, a > 1. Then <£(s) and 
\p(s) are analytic in some neighborhood of s = 0. 

We use a theorem of Poincar6 [11] which insures that there is exactly one 
function 4>(s) analytic near s = 0 with <J>(0) = <£'(0) = 1 and satisfying 

4>(sx) = /[#(«)]. 

(Although Poincares proof is for the case f(s) rational, it applies equally well 
here.) The circle of convergence of the MacLaurin series for 4>(s) has radius t a 
where 4>(t a ) = ot. An argument whose details are given in [12], p. 21, then 
shows that = 4>(s) for | 8 | < t a , and Lemma 3.1 follows. (The argument 
is necessary to rule out the possibility that the <f> n (s) converge to 4>(s) for 
Re ($) < 0 but to some other function for Re (s) > 0.) Clearly and ^(s) 
are entire if and only if f(s) is entire. 

Lemma 3.1 is useful for actual computation of Giu). The (non-negative) 
coefficients c r in the series <t>(s) = 1 + s + c^s 2 + • • • can be determined by 
differentiating (2.7) at s = 0. The series can be used to compute values of the 
characteristic function <f>(it) on some interval to < t < far, where to is a small real 
number; the values of <j>(it) for the remaining values of t are determined by (2.7). 
(Note that the real and imaginary parts of <f>(it) are respectively even and odd.) 
Then the usual inversion formula is used to obtain G(u). A numerical example 
of this procedure is worked out in section 5. 

Definition 3.4. The number p is defined by p = logxd if f(s) is a polynomial 
of degree d, p = °© otherwise. 

Theorem 3.4. Let f(s) (and hence k(s)) be a 'polynomial of degree d . Then 
for s > 0 

- LM + /,(,); 

a 

L(s) is continuous and positive;L(s) and L 0 (s) satisfy respectively 
L{sx) = L(s); L 0 (s) = 0 , s °o. 

The proof is in section 9. (Theorem 3.4 may be compared with a more widely 
applicable but less precise result due to Shah [19].) 

Corollary, ///(s) is a polynomial of degree d, \l/(s) is an entire function of 
order p and type C where C = Max L($), 1 < s < x. 

An explicit determination for C has not been found. An approximate numeri¬ 
cal determination is not difficult; the function Lis) = lim — can be 

„-00 s p d n 

determined numerically for a number of values on some convenient interval 
So < 8 < SqXj and the maximum value approximated. The importance of C 
will be indicated in the conjecture following Theorem 4.3. We may also men¬ 
tion that the quantity [Max L(s) — Min L(s)], 1 < $ < x, is of some interest. 
Some numerical work indicates that in certain cases L(s) is at least approxi¬ 
mately constant. 



480 


T. E. HARRIS 


4. Some properties of G(u). Since it will be convenient to work with H(u) 
rather than G(u ), we state the content of Theorems 4.1, 4.2, and 4.3 in terms 

of G(u ): G(u) = a + / g(v) dv for u > 0. The density g(u) is continuous for 
Jo 

u 0. If Ez\ < <» then g {r \u) is continuous for u 9* 0 provided r < y + k — 1 
and is continuous for u — 0 provided r < y — 1. Near u = 0, G(u) y provided 
y < oo , approximates, in a certain mean sense made clear by Theorem 4.2, the 

function a + , 2 ^W[w(l — a)], where for convenience we have defined 

1 (1 + 7) 

M(u) for positive u by M(u) = il/(- 


- w). It is then shown that in a certain sense 


g(u) goes to zero faster than exp ( —tr *) and slower than exp (—vr ) where « is 
any positive number, Q being defined in Theorem 4.3. A conjecture is given of a 
more precise result, applicable when/(s) is a polynomial: in the same sense g(u) 
goes to zero (more, less) rapidly than (exp [— (.A* — €)u°], exp [—(A* + €)u 0 ]), 
where A* is defined in the conjecture. 

Definition 4.1. Let H'{u) = h{u ). 

Theorem 4.1. H(u) is absolutely continuous. Theorem 3.3 shows that H(u) 
is continuous; see [13], p. 25. This incidentally shows that G(0) = a . If 
y > \ the absolute continuity of H(u) follows from the Plancherel theorem. 
See any text on Fourier transforms. In any case, define the functions 


*-(») = ^11 e~' ,u M) dl, m = 1, 2, • • • . 

An integration by parts 2 gives for u j* 0 

(4.,) m«) - - *-.wi + jL £«-■«»««. 


If 0 < Ui < u < U 2 , (4.1), (3.4), and (3.6) show that the continuous functions 
h m (u) converge uniformly in [ui , ut) to a continuous function h(u). Moreover 


(4.2) 


H(ui) — H(ui) = lim [ 

m— *oo J—» 


, — 2 irit 


4f(it) dt 


= lim / & ro (?/) du = / /&(w) dw, 

m—»ao J Mj J Ui 


the first equality in (4.2) following from [13], p. 28 and the second from the fact 
that the h m (u ) are uniformly bounded for ui < u < v* . In case Ez\ < oo 
and r < y + k — 1, repeated integration by parts of (4.1) and reference to 
remark (b), Theorem (3.3), shows that the first r derivatives of h(u) are con¬ 
tinuous if u 0. The usual integral expression for h(u) in terms of \p(it) shows 
that 7 > r + 1 implies h (r) (u) is continuous at 0 . 


I am indebted to J. W. Tukey for this suggestion, which simplifies the original proof. 



BRANCHING PROCESSES 


481 


Corollary to the continuity of H(u): the numbers p nr = P(z n = r) 
uniformly in r, r > 1 , as n — » ». We have 

r ~ ■ [°‘ (?) ■ ® (?)] + [° (?) 

The desired result follows because G n (u) —>G(iO uniformly for a > 0 and because 
(j(w) must be uniformly continuous for 0 < u < » (right-continuity at 0 ). 

We next consider the behavior of H{u) near u — 0 , when y < oo. Theorem 
3.3 suggests what sort of result may be expected. If the function M(s) of 
Theorem 3.3 were a constant M it would follow from a Tauberian theorem due 

to Karamata (see [14], pp. 189-192) that H(u) ~ j,— — as u —> 0 +, or 
II (u) M 

y +i ~ • Integrating both sides of this relation from u to ux would 


u 
give 

(4.3) 


uV(y + 1) • 


r 


H(v)dv 

i>y+ l 


1 


v 


M dv 
v 


r(7 + i) 

The analogue of (1.3) turns out to be true, as shown by Theorem 4.2, which 

0 +. 


i i • • Tr , . _ . u y M(u) 

shows that m a certain mean sense, H{u) behaves like — - as u 

1 (7 "T 1) 


(We defined M(u) = M(—u) for u > 0.) 
Theorem 4.2. 


. [ 1,x II(o) dv = _ 1 _ [* M(v)dv 

u-*o+Ju v 7+1 F (y -f- 1) v 


The proof, \Uiich follows directly along the lines of the proof of Karamata’s 
theorem, is sketched briefly in section 9, for a somewhat more general situation. 

A corollary of Theorem 4.2 is that if 7 < 1 , h(u) cannot be bounded as u —> 0+; 
for h(u) < A” implies 


lim 
U -»0 + 



K • rdv 

^7+1 


> (7 + 1) 



dv > 0 , 


or 


lim u l y > 0 , 

w-* 0 + 


which implies 7 > 1 . An example to be given in section o shows that if 7 = 1 , 
h(u) is at least in certain cases bounded but discontinuous at 0 . 

In order to consider the behavior of H(u) as u —» 00 we first prove a theorem 
which applies to any distribution whose m.g.f. is an entire function. 

Theorem 4.3. 3 Let F(u) be any c.d.f. whose m.g.f. $($) is entire. Let p be the 
order of £(s). Let Q be defined by 

Q = l.u.b. q: f°° e M " dF(u) < <*. 

J— 00 


8 Before completing the present proof, the writer communicated this result to U. P.Boas, 
Jr., who sent back a proof along different lines. 



482 


T. E. HARRIS 


Then- + ~ = 1. 

p Q 

The proof is given in section 9. 

Combining Theorems 3.4 and 4.3, we obtain immediately 

Theorem 4.4. Let Q = l.u.b. q : f e u *h(u) du < «. Then Q — /> —: . 

Jo p — 1 

Here p is given by definition 3.4. If f(s) is not a polynomial, whether entire 

or not, the proof of theorem 4.3 will show that Q = 1, and we interpret theorem 

4.4 in that sense. The trivial case/(s) = s k is excluded, so p > 1. 

Conjecture. Let £(«) of theorem 4.3 he of finite order p and of type C, 


0 < C < oo. Let Q — 


and let A * l.u.b. A'\ 


£ 


e rM0 dF(u) < oo. 


P - 1 

Then (i C p ) q -(AQY = 1. 

The proof for the case p rational follows the same lines as the proof of Theorem 
4.3; a general proof has not been found. If the conjecture is true then having 
determined p and Q, when k(s) is a polynomial, and having estimated C by the 
procedure indicated following the corollary to theorem 3.4, we obtain 

= 1 /_l\i/(p-u 

Q\Cp) 


(4.4) 


e A ' uQ h(u) du 

responding number A* which applies to g{u) is given by 
(4.5) A* = A( 1 - a) Q . 


< oo. The cor- 


5. Some special cases. In this section we shall discuss some special cases in 
which the m.g.f. <f>(s) and the c.d.f. G(u ) may be determined explicitly. For 
these cases and for certain others there is a close relationship between the simple 
discrete branching process and another type of model to be discussed in section 8. 
Finally a numerical computation of the distribution G(u) will be given for a 
particular case where /(s) is a second degree polynomial. 

Suppose/(«) has the form 


/M-l-) 

a a \1 + « — as/ 


with x > 1, a > x — 1, where/'(l) = x and/"(l) + /'(1) = Ez\ = x(l + 2a). 
It is easily verified (as pointed out by Poincar6 in [11]), that the solution of the 

( x — l)s 

equation <f>(sx) = /[<£(«)] is given by </>(«) = 1 -f -- - with 0(0) = <t>'(0) 


The number a satisfying a = /(o) is given by a = 
0(s) and k(s) of section 4 are given by 0(«) 


x — 1 — as 

a + 1 — x 
a 


1 — 8 


, *(•) « 


1 . 


The functions 


x — (x — \)s * 



BRANCHING PROCESSES 


483 


The number y of Theorem 3.3 is 1. The density function h(u) (definition 4.1) 
is simply e" u , as seen by direct calculation. The number Q of Theorem 4.3 is 1, 
as it should be, since/(s) is not an entire function. The c.d.f. H(u) is 1 — e~ u t 
and H(u) ~ u near u = 0, in agreement with Theorem 4.2. Various aspects 

of the case /(s) = have been discussed by numerous authors. 

Somewhat more generally, we may consider generating functions of the form 

(5.1.) k(s) = $[# — (x — lKT 1 '", x > l. 


The function k(s) is a generating function if and only if m is a non-negative 
integer. In this case we have 4 >(s) = \fr(s) = (1 — ms)~ 1,m and g(u) = h(u) = 

-L_ Here y — - , and we note that unless m = 1 the 


density function h(u) is unbounded near u = 0. A physical interpretation for 
this case will be given in section 8. 

As a numerical illustration we consider the case f(s) = 0.4s + 0.6s 2 . We 
have x = Ez\ =1.6 and <r 2 = E(zi — x) 2 = 0.24. For the asymptotic distribu- 


‘(f)- 

1.9495 so that ^(s) which is identical with <£(s) in this case, is as 


tion, Ew = 1, E(w — l) 2 = ^ ~'_ x = 0-25. The number y = logi 


| s | goes to with Re (s) < 0. This implies that the c.d.f. H(u) and likewise 
(?(u), since the two are equal here, behaves like [l/r(l y)]M(u) times u l,949 ‘ 
near u = 0, where the “behavior” is in the sense of Theorem 4.2. Numerical 
determination of M(u) would not be difficult. The number p of Theorem 4.4 
is given by log* 2 = 1.4748. This means that ^(s) is an entire function of order 
1.4748 and hence that the density function h(u) goes to zero more rapidly than 

e~ uQ # and less rapidly than for any € > 0, where Q = — = 3.1061, 

p — l 


and “more rapidly” is used in the sense of Theorem 4.4. 

log i^(sx n ) 

The function L(s) = lim — - was computed for four values of s between 

n —*ao S p Zi 


5 = 1 and s » x = 1.6; in each case the value was 0.744625 so that it appears 
likely that .here L(s) is constant. Hence C = Max L(s) = 0.744625 and the 
quantity A defined by (4.4) is 0.26430. Thus the conjecture following theorem 

4.4 indicates that I g(u)e {0 ' 744826± • )u,l0#l du is (divergent, convergent) accord- 

JQ 

ing as the + or — sign holds. 

Through the kindness of Mr. Cecil Hastings of the Douglas Aircraft Company, 
the c.d.f. G(u) was computed for this case. The coefficients in the power series 
expansion of 4 >(s) were obtained from the functional equation (3.1) and G(u) 
was then obtained by inverting The values of G(u) are given in Table I. 



484 


T. K. HARRIS 


6, Number of generations to extinction. It was pointed out in section 2 that 
when x < 1 the probability is 1 that z n ® 0 for some integer n. We assume 
through-out section 6 that x < 1 . 


TABLE I 

G(u)> the limiting probability that z n /x n < u for the case f(s) — 0.4s + 0.6s 2 


V 

G(u) 

67oo 

.00000 

0.25 

.04753 

0.50 

. 17275 

0.75 

.34550 

1.00 

.53117 

1.25 

.69932 

1.50 

.83042 

1.75 

.91857 

2.00 

.96781 

2.50 

.99751 

3.00 

.99993 


Definitions 6 .1. Let the random variable N be the smallest integer n such 
that z n +i = 0. Define the moment-generating function of N by 

6(s) = E e n, P{N = n). 

n-0 

60 

Clearly P(jV = n) = p „ +) . 0 - p„o, so that 0 (s) = E e n '(p„+i,» — pno). 

n«xO 

Definitions 6 . 2 . Let = 1 — p n + i,o, with 6 0 = 1 — po . The numbers b„ 
satisfy the recursive relation 

( 6 . 1 ) b n +i = 1 — /(I - 6 „). 

Define the function 0 i(s) by 

0 i(s) = 23 

n«0 

We see that 

(6.2) 0(«) = 1 + («■ - l)0i(s), 

so that it suffices to determine the function 0 i (s). 

The function 6 \(s) belongs to a type which has been studied by Fatou [15] 
and Lattfcs [16]. If we let e* = z we see that 0 i(z) is a power series whose coeffi¬ 
cients are successive iterates of the function f*(b) = 1 — /(I — 6 ); i.e., 5 n +i — 
f*Q>n) = /«+i(5o), where/*(0) - 0 , /*'( 0 ) = x < 1 . It was shown by Fatou 



BRANCHING PROCESSES 


485 


—nlog x, n = 


was obtained by Lattfcs, the expansion converging everywhere except at the 
poles. The quantities y r and y Q are defined as follows: the function y(s) = 
+ • • • is determined by the functional equation y(sx) = /*[/a(s)] 
with the condition a'(l) = mi = 1. The number y Q is determined by n(y Q ) = 
60 = 1 — po. Perhaps the easiest way to determine y Q is to use the fact that 
the inverse function u~ l (s) satisfies the functional equation fT^W] = xy~ l (s), 
from which we can determine the power series for m~ 1 ( 6 o). 

Since the use of Lattta’ expansion requires finding the expansions of y(s) and 
ju“ l ($), we now give another method, giving a different kind of expansion; this 
method appears particularly adapted to the case here illustrated, where f(s) 
is of the second degree. Then (fi.l) becomes 

(6.3) 1 = xb n - p 2 b\ , b Q = 1 — p 0 . 

Definition 6.3. The functions 0 a. (s), k = 1, 2, • • • , are given by 

(0.4) e k (s) = Z (6„)V. 

n—0 

If we raise both sides of (6.3) to the A:th power, multiply both sides by e n \ sum 
on n from 0 to 00 , and solve for Ok(s), we obtain 

bte- + t (jf) (- P! )V-V(«). 

(0.5) e t (s) = - , - 1 V J .- t - 


that a function of this sort is meromorphic with poles at s = 
1, 2, • • • . An expansion for 0 i(s) in the form 


*i<«) = . 

I — xc a 


+ T-- 


M 2 yl 


+ -^ y \,+ 

1 — .tV 


(Justification for the rearrangement of series will come out of the subsequent 
proof.) If we put 1: = 1 in (6.5) we obtain 


( 6 . 6 ) 




Definitions 6.4. We define recursively sequences of functions S n (s) and ft»(s), 
such that for each n, 0i(s) = S n (s) + R n (s). Let 



aw - 


p 2 02(s). 

e~* — x 


Suppose now that R n (s) is of the form A nl 0 n+ i(s) + • • • + A nn 02 »(s), the A ni 
being functions of $, p 2 , and x, but not explicitly of 6 0 ; while S n (s) is a rational 
function of e~ a , p 2 , and x, and a polynomial of degree n in b 0 . Now put 
k = n + 1 in (6.5) and substitute the expression obtained for 0 n+ i(s) into R n (s). 
Collecting terms we now define R n +i(s) as the sum of terms involving 0 n + 2 (s), • • • , 
02 n+ 2 ($): Rn+i(s) — A n+ i,i 0 n+ 2 (s) + ••• + A n +i,n+i 02 n+ 2 (s); then 5n+l(s) = 



486 


T. E. HARRIS 


0 i(s) — J? n +i(«) is a rational function of er*, pz , and x , and a polynomial of 
degree n + 1 in bo . 

Theorem 6.1. Let f(s) = po + Pxs + Ptf 2 , with x < 1. Suppose that 
x + Ptbo < 1. Then the junctions S*(«) converge to 0i(s) in a neighborhood of s - 0. 

The restriction x + pJ>o < 1 may fail to hold. However this is not a serious 
restriction; we pick a value of n so that x + pJ>n < 1. Then 

01 (s) = ho + • • • + b n - i6 <n 1)# + e n *0j ($), 

00 

where 0 ?(s) = is the same type of function as 0 a (s); theorem 6.1 

* 

is then applicable to 0 i (s). 

If the conditions of theorem 6.1 are satisfied, we have 

0 i(s) = bo«~Vi(*> x ) — P 2 b 0 Tr 2 { 8 , x) + 2 xplb 2 0 T 3 (s f x) 

- plbl(e~‘ + 5xV 4 (s, *) + • • • i 

where x) = II ( W* • Since JS(N') = 0 '(O) = 0 i(O) and E(N 2 ) = 

r— 1 ® / 

0 "(O) = 20l(O) + 0 i (0), we have 

E(N) = bbI jtiCO, x) — P 260 5 r 2 ( 0 , x) + 2 xp 26 o 2 r 3 ( 0 , x) 

— p?t>u(1 + 5x 3 )ir4(0, x) + • • •], 

E{N 2 ) = -E(N) + 2boW'i(0, x) - p*b„x£(0, x) 

+ 2xj>lbW 3 (0, x) - (5x 3 + Op'bo^O, x) 

+ p 2 bl jr.(0, x) + • • • 1 
where 7r£(0, x) = tu (0, x)J2 r . 

r.l 1 X 

We now prove that if x + p 2 &o < 1 , the expansion (6.7) is valid in some neigh¬ 
borhood of s = 0 . We shall denote the particular values of x, pz , and bo with 
which we are dealing by £, p 2 , and 80 . Now let x, pz , and b 0 be three complex 
numbers, arbitrary except for the following restrictions: 

(6.8) I x | + | P21 < 1, | 601 < 1 

and define the numbers b n in terms of b v , x, and pz , by means of (6.3), with 
0 k (s) defined by (6.4). 

We first show that (6.7) is valid if ( 0 . 8 ) holds, and then show that the domain 
of validity also includes the original numbers x, p 2 , and b 0 , provided 

& + fJbt>< 1 . 

If ( 6 . 8 ) is satisfied, we have \ b n \ < A \ x\ n where A is a positive constant. 
Now suppose 1 < T < * . Then the series defining 0 *(«), h * 1 , 2 , • • • , are 



BRANCHING PROCESSES 


487 * 


uniformly and absolutely convergent in the domain | e* | < T. Moreover, if 
I * I + | Pa | ** A < 1, we have | b n | < boA n whence, if k is an integer large 
enough so that TA k < £ , 

(6.9) 10*(s) | < 2i»o 


for | e $ | < T. In what follows , we assume | e' | < T. Now write 0 i(s) = 
» 

S n (s) + 23 A nj (p 2 , x, s)$ n +j(s), where n is large enough so that TA n < b . 

/-i 

Let A n (pt , x, s) — Max | A n j(p 2 , x, s) | . Passing to the next stage we see 

1 £ j&n 

IA 1 1 / A n+l \ 

that A„+i < A n + —-—- A n+l < A„( 1 + _*-« ). Hence the numbers 

ex \ e x J 

A n are bounded. This fact, together with (6.9), shows that lim R n (s) = 0. 

n—co 

Now suppose that x and b 0 have their original values x and b 0 while p 2 is small 
enough in absolute value so that x + | | < 1 . In this case lim S n (s) = 0i (s). 


We observe that S n (s) is a polynomial of degree n — 1 in p 2 and that S n+ i(s) is 
obtained from S n (s) by adding a single term of degree n in p 2 . Thus 0 i(s) has 
been expressed as a power series in p 2 . Now consider 0 i(s) as a function of p 2 , 
with b Q = b 0 , x = x. If x + 6 o | Pz | < 1, we have b n = 0[(x) n ]. Thus Oi(s) 

1 — x 

is analytic in p 2 for | p 2 1 < - and the expansion in (6.7), being a power 
series in pn , must be valid when x + p 2 ho < L 


7. Estimation of parameters. Until now we have assumed that the param¬ 
eters p r are known numbers. We may wish, however, to estimate them, having 
observed the numbers z x , z* , • • ■ , z n +i . In order to get simple maximum like¬ 
lihood estimates for the p r , it appears necessary to introduce certain auxiliary 
random variables. 

Definitions 7.1. Let z m * be the number of individuals in the mth generation 
who have exactly k descendents in the (m + l)st generation. Let Z n = 
1 + + * * • + z» • 

Theorem 7.1. Maximum likelihood estimateb of p r and x y based on observed values 
of Zmkforrn < n f are respectively , 


Pr — X/ Zmr/An) X —(A n +1 l)/A n . 


IS 


(Note that the estimate x involves only Z \, • • • , z„+i •) 

If z m is fixed the joint conditional probability function of z«o, Zmi, • 

npM/n (z mr )!. Thus the joint probability function of the z mf for 

r-0 J / r.O 

, n, and r = 0 , 1 , 2 , • • • , is given by the product of two factors, 


m 


0 , 1 , 


one of which is independent of the p r , the logarithm of the other being 23 (23 4 r ) * 



488 


T. E. HARRIS 


log p r . The value of this expression is clearly maximized by taking p r — Pr 
as given above. Since 23*mr = z m and ^jrz mr = z m+l , the quantity 23 r Pr gives 

r r 

£ as above. 

Although the estimates p r are the same as we would obtain if we were dealing 
with Z n trials from a multinomial distribution with probabilities p r , the joint 

n 

distribution of the quantities 23 z mr, r = 0, 1, • • • , is not multinomial. For 

rtlmm 0 

example, i ' Z n > 1 the probability of the event 

113 = Zn , H *mr = 0 for T o) is 0. 


We shall next show that the estimate x is, in a certain sense, consistent. 
Theorem 7.2. If x > l, the random variables Z n +i/Z n converge in probability 

to the random variable xV* where V* = - if w = 0 and V* = 1 if w 0. 

x 

If w ^ 0 then for all n, z„ ^ 0 and 1/Z n —> 0 as n —» ». Hence in this case 
(Z„ + 1 — 1 )/Z n converges to x if Z n +i/Z n does. On the other hand, P(w = 0) 
= a = P(z n = 0) for some n, so that if w = 0, Z n +i/Z n = 1 with probability 1 for 
n large enough. Thus we need only show that Z n +\/Z n converges to x if x > 1 
and w t* 0. 

We need the following: 

Lemma 7.1. If x > 1, the random variables Z n /x n converge in probability to 
wx 

x — 1' 

Since 


(7.1) 

it will be sufficient to show that lim 


jmx _ Zn __ w f x \ 

- 1 x» ~ &+ l \x — 1/ + 

SCrh)’? 


(w — W r ) 


,2n+2 E(w 2 ) = 0 and lim 


2^23 — = 0. The truth of the first statement is obvious, since Ew 2 

is finite. It follows from (2.5) that E(w r w t ) = Ew\ if 8 > r, E(ww r ) = lim 


E(w n w r ) = Ew^r , whence 2?(w — «v) 2 = ^ 

»r2 


r and 2?[(w — w r )(u/ — to,)] = 


(x 2 - x)x' 


if s > r. Then 


<5 < ^W^[§*’ +2 5§4 


and this quantity clearly approaches 0 as n —* oo , proving Lemma 7.1. 



BRANCHING PROCESSES 


489 


Define the random variables w* and V n as 

w* = w when w 0 

w* = 1 when w = 0 

V n = ^ when z„ * 0 

X* 

X when 2 B = 0. 

x — 1 

X 

It is clear that the V n converge in probability to w* ^ and we note that the 
c.d.f. of w* is continuous at w* = 0. Hence, 

lim p(| ^ - 1 > « > o) = lim P(F n+1 - y„ ± y nt $ 0) 

n-»*> \| r n / »-*oo 

= p(j^» S o).°. 

It follows, under the conditional hypothesis w ^ 0, that the variates con- 
verge in probability to x, since 

/j r/ 1 = i when Zn+i ^ 0. 

8. Continuous models. As mentioned in section 1 there are situations where 
it is more important to consider the number of individuals existing at a given 
time than the number in a given generation. Let a set of probabilities p r be 
given. The question arises whether we can interpret these as probabilities that 
an individual will have a given number of descendents at the end of some fixed 
period of time. We might then suppose that each individual in existence at 
that time has the same probabilities of having a given number of descendents at 
the end of the next (equal) length of time, these probabilities being independent 
of the age of the individual. A model of this sort might be considered in certain 
fission processes, if the probability of fission is independent of age. It should 
be noted that the “descendents” of an individual may include the individual. 
For example, if a bacterium splits in two we may either regard it as having pro¬ 
duced two descendents and dying, or as having produced one descendent and 
itself surviving^ 

If an interpretation of this sort is to be satisfactory, interpolation in time must 
be possible. In other words there should exist a family of functions /»($) defined 
for all positive n such that /n v [/» 2 ($)] = /m+n 2 (s); such that for each positive n, 

/ n (s) is a probability generating function, / n (s) = ^2p r (n)s r ] and such that for 

r-0 



490 


T. E. HARRIS 


n = 0,1, 2, • • • the functions /„($) coincide with the iterates s, /(«), /[/(«)], 

We may then interpret / n (s) as the generating function at time n. It is readily 
seen that in general such a family of functions will not exist. For example, if 
such a family exists we must have f(s) = nth iterate of /i/»(s) for arbitrarily large 
integral n, so that/(s) cannot be a polynomial of degree > 2. 

The functional equation <t>(sx) = f[<p(s)] shows that f(s) — 4>[x0 _1 ($)], whence 
fn(s) = <t>[x n <tr\8)] for integral n. The expression ^[x n <^ -1 (s)] then might be 
taken as the definition of f n (s) for all positive n. See Hadamard, [9]. The prob¬ 
lem of determining whether the functions so defined are a family of generating 
functions will be discussed in a subsequent paper. We remark, however, that 

considered in section 5 then the iterates f n (s ) 


if /($) has the form 


1 )8 


have the form 


‘; they are clearly generating functions for all posi- 


V- (x n - 1)«' 

tive n, satisfying the required relation / ni (/» 2 ) = /« t + 


~ 9 [*-(*-] 


Now suppose g{8) 
;1 is a generat- 


is some function such that the function f(s) - y ^ ^ 

ing function for all x > 1, with ^(l) = 1. As pointed out by Ulam and Hawkins, 
the iterates of functions f(s) of this form are convenient to work with, the nth 

iterate being simply g~ l 1 »~3~jr. n_ 1)^(3) In a(i(iition > the requirement that 

/($) be a generating function for all x > 1 shows that the functions f n (s) are 
generating functions for all n > 0. The simplest function g(s) which satisfies our 
requirements is g(s) = s m , where m is any positive integer. In this case f(s) 
has the form considered in (5.1) and f n (s) = s[x n — (x n — l)s m ] _1/m . As n —> 0 




n log x 


Is + -—^—8 
m 


m+l 


+ 0 (n 2 ). We may interpret this 


we have/„(s) 

as follows. A particle in existence at a given time may, in a short time interval 
At, either split into m+l particles, with probability — ■ - ; or it may remain 


m 


At Iok X 

unaltered, with probability 1-—. If it splits, each particle produced 

m 

has the same chances for splitting as its parent, etc. Thus, from the results of 
section 5, it follows that if we begin with a single particle at time t = 0, the 
asymptotic probability density function for zjx\ where z t is the number of 

particles at time t, is given by {m~ l ,m u ,m ~ l e~ ulm )/Y 

It is, of course, customary to begin with the elementary probabilities for a 
certain number of births in a short time At and determine the functions / n (s) 
from these by means of differential equations. See, for example, Arley, [17], 
The results of the present paper can be applied in some cases to the continuous 
problem even when an explicit determination of the / n ($) is difficult. A discus¬ 
sion will be given in a later paper. 




BRANCHING PROCESSES 


491 ' 


9, Some proofs. We give in this section proofs for (A) theorem 3.3, (B) 
theorem 3.4, (C) theorem 4.2, and (D) theorem 4.3; in certain cases we shall 
indicate slightly more general results. 

(A) We make use of a result of Koenigs, in the form applicable here. 

Koenigs' theorem: If | s | < X < 1 and qi 9 * 0, then k n (s ) = q?B(s)- 

[1 + 0(qi)] where B(s) is analytic for | s | < X and satisfies the functional equation 
B[k(s)} - qi B(s). 

Here, 0(0?) means bounded by Aq" , where A is independent of s. We remark 
that B(s) 0. The proof of Koenigs' theorem follows readily if we write k n (s) = 

where $(a) = - q t . 

i-i { Qi ) » 

Now let h be a positive number such that | *(s) | < 1 when 0 < | s | < ti and 
Re(s) < 0. (For the rest of this proof we assume Re(s) < 0.) Such a number 
exists; on the imaginary axis we have \p(it) = 1 + it — iE[(w') 2 ]t 2 + o(t 2 ) where 
E[(w') 2 ] > 1, w' having the distribution branching from k(s), showing that 
| \f/(it) | < 1 if t 9 * 0 and sufficiently small; while if Re(s) < 0 we refer to the 

expression *(s) = I e gu dH(u). Let X = Max | *(s) | for t\/x < | s | < t \. 

Jo 

If | s | > t\ let N(s) be the smallest integer such that | s \/x N(,) < ti . Then 

*00 = kst.Ms/x"^)] = q? {t) B\Ms/x N(t) m + 0( g r u> )] -BlMsm + Oiq?")]. 
Now B(iKsx)] = qiB[\Ks)]. Let M 00 = | s | T JB[^(s)]. Then M(sx) - M(s). 
Also log* | s/h | < N(s) < 1 + log* | s/h | , and theorem 3.3 follows. Clearly 
M(s)/\ s | 7 is continuous for t\X < | s | < h, and hence, by functional continua¬ 
tion, wherever Re(s) < 0, s ^ 0. 

Concerning the remarks following Theorem 3.3 we have the following: 

(a) If Ez[ < 00, r-fold differentiation of \p(sx n ) = fc„[*(s)] gives, for | s | > 
h>0 y 

*»(.) §«,«»[* (i.)], 

),•••, *'"(£)■ Now I *■(») I " °<i’> 

when | s | < X; because of analyticity, the same must be true of | kn\s) | . 
Put n = N(s) in (9.1 ),N (s) being the integer defined above. Since = 

0 (qi) = 0(( 1/ | s | 7 )),remark (a)follows. 

(b) B(s) is clearly > 0 when s > 0; hence M(s) > 0 when s < 0. Since 2?(0) = 
0, B(s) 9 * 0 for sufficiently small s 9 * 0; since *(5) —>0as | s | —► °o, ilf (s) 5* 0 
for | 8 | sufficiently large; since M(sx) = M(s ), remark (b) follows. 

(c) If y =* *, i.e., qi = 0, then k n (s) goes to zero with great rapidity as n —> 00, 
if | s | < 1. The general line of argument is clear. 

(B) Let k(s) be a 'polynomial of degree d > 1 with real coefficients , k(s) = 
tfo + • • • + qdS d y with a non-negative double pointy k(a) = a > 0, and such that 
k(s) > swhen s > a. Let \p(s) be any solution of the functional equation *(m$) = 


where Q r j is a polynomial in * 



492 


T. E. HARRIS 


&hK$)] which is continuous for s > 0 and satisfies \fr(s) > a for s > 0; here m is any 
number > 1. Then theorem 3.4 holds , with x replaced by m. 

It is not difficult to show that if a < s x < s < s 2 , lim k,(s) - oo uniformly in s. 

j- 00 d 

Hence^(s) —> « as s —> «. Write Z2(s) = log (1 + — jt^Qd-jS 3 }- Then dT n - 

\ Qd j-1 / 

log *(m n ) = <r n log *„[*(•>] = (i - <r n ) log q d /(d - i) + log m + E 

7-1 

d~ J R(kj-i[)l/(s)])y 8 being taken large enough so that is continuous. 

Thus, since the functions ^(Ajj-iW'W]) are bounded, the functions dT n log \l/(sm n ) 
converge uniformly, for s sufficiently large, to a continuous function L*(s) satis¬ 
fying L*(ms) = dL*(s). Let L(s) = 2 -p L*(s), where p = log m d. Theorem 3.4 
now follows by an argument similar to that used to conclude theorem 3.3. 


(Note that E tT'flfa-ibK*)]) = 0(<T"))- 

»+l 

(C) In order to avoid negative signs we work with the Laplace transform in¬ 
stead of the m.g.f. 

Let H(u) be nondecreasing on (0, °o) with H( 0) = 0; let &(s) — / e~‘ u dH(u) 

Jo 

be finite for s > 0. Suppose ^(s) = + °(^j as 8 00 > where 0 < y < <», 

M{s) is continuous and satisfies M(sx) = M(s) for s > 0, x being some number 


> 1. Then 


i lim f 

u-*04- J w 


H(v) 


dv 


r(7 +1) 


r mxv) 

J l V 


dv. 


Following the lines of the proof of Karamata’s theorem, we see that for any 
y > 0, J s 7-1 '^) ds — I) + o(l) as s —> oo where D — ds ; i.e., J s 7-1 • 

*xy /»« 

eT tu dH(u) = D + o(l), or replacing s by (n + l)s, / s 7 " 1 ds e ~ ,u e~ n9U . 

Jy Jo 


ds 


D 


dH(u) = D/in + l) 7 + o(l) = 7 ~J e - , e~ n ’s y - 1 ds + o(l). It follows as in 

i \y)Jo 

[14], pp. 189-192, that if F(u ) is any function of bounded variation in (0, 1) we 
have 


(9.2) 


lim H s'- 1 ds rV~*“ F(e~ ,u ) dH(u) = /"V* fW -1 <*«• 

y-*ao Jy Jo 1(7) Jq 


Let F(e~*) = e* ifO<$<l and 0 otherwise. Then the theorem follows from 

(9.2) . 

(D) Theorem 4.3 is true if F(u) is any bounded monotone increasing function. 
For simplicity we assume that F(l) = 0; it is readily seen that this causes no 
loss in generality. The proof is given for the case 1 < p < oo ; it will be clear 
that p = 1 implies Q - oo, while if p = oo (or if {(s) is not entire) Q = 1. 
Suppose m and n are positive integers such that m/n < p/(p — 1). Then 

(9.3) f exp (« m/ ") dF(u) = E * f u {mrM dF(u) < n E [(r ± *) ot]! c (r+I)m 

J 1 ^ ^ r-0 r\ Jl r—o (m)l 



BRANCHING PROCESSES 


493 * 


*«( 0 ) 

where c k = ; interchange of integration and summation are justified by 

the positiveness of all terms involved. Suppose 0 < e < ^ — ^1 — ; for k 

sufficiently large the inequality c k < Ar M(1/p)-<) is satisfied; see [18], p. 253. 
Hence using Stirling’s formula, we see that the last series in (9.3) is dominated 
by a series whose rth term, for r sufficiently large, is controlled by the factor 

r rm(i-(i/ P )+f-(n/w)) 1—2 -f « - - is negative, the series, and hence the 

p m 


integral, converges. 


We have thus proved ^ + - 
Q 9 


< 1 . 


Conversely, suppose m > —. Let £(s) = ]£{*(s), where {*(«) = 

n P — 1 *-0 r—0 

s* +rn , k * 0, 1 , • • • , m — 1 . At least one of the functions {*($) must be of order 
p. We suppose that £ 0 (s) is; if not the argument would need only slight modi¬ 
fications. We have 


(9.4) f exp (u mln ) dF(u) > n £ .. 

Ji r-o [(r -f l)n]! 

1 Tl 

Suppose 0 < c < 1 — From [18], p. 253, the inequality c rm > (m)~ rm(1/p+0 

must hold for infinitely many values of r. As in the first half of the proof this 

shows that the series and the integral in (9.4) diverge. Thus “ + ^ 1 and 

p Q 

the proof is complete. 

If p is rational, the conjecture following theorem 4.3 can be proved in a similar 
manner making use of a relation between the class of an entire function and the 
coefficients of its series expansion; see [14], p. 95. 


REFERENCES 

[1] R. A. Fisher, The Oenetical Theory of Natural Selection , Oxford, the Clarendon Press, 

1930. 

[2] Alfred J. Lotka, Theorie Analytique des Associations Biologiques, 2, Paris, Herman, 

1930. 

[3] J. F. Steffensen, “Deux problemes du calcul des probabilities,” Annales de VInst. 

Henri Poincarh , Vol. 3 (1933), p. 331. 

[4J D. Hawkins and S. Ulam, Theory of Multiplicative Processes , I, Los Alamos Declassi¬ 
fied Document 265, 1944. 

[5] A. Kolmogoroff, “Zur losung einer biologischen aufgabe”, Communications for Math. 

and Mechanics at Tchebycheff University , Tomsk, Vol. 2, part 1 (1938), p. 1; sec 
Fortschritte der Math.y Vol. 64 (1938), part 2, p. 1223. 

[6] A. Kolmogoroff and N. A. Dmitriev, “Branching stochastic processes, M C. R. ( Dok - 

lady) Acad . Sci. URSS (.N.S. ), Vol. 56 (1947), pp. 5-8. See Math. Rev. Vol. 9, 
No. 1 (1948), p. 46. 

[7] A. M. Yaglom, “Certain limit theorems of the theory of branching random processes/’ 

Doklady Akad. Nauk SSSR ( N.S. ), Vol. 56 (1947), pp. 795-798. See Math. Rev. 
Vol. 9, No. 3 (1948), p. 149. 

[8] A. Kolmogoroff, Qrundbegriffe der Wahrscheinlichkeitsrechnung , Chelsea, 1946. 



494 


T. E. HARRIS 


[9] J. Hadamard, “Two works on iteration and related questions,” Am. Math. Soc. Bull., 
Vol. 60 (1944), p. 67. 

[10] G. Koenigs, “Nouvelles rechcrches sur lea integrales de certaines equations fonction- 

nelles,” Annales Sci. de VEcole Normale Sup. de Paris, Vol. 1, series 3, suppl. 
(1884), p. 3. 

[11] H. PoincarA, “Sur une classe nouvelle de transcendantes uniformes,” Journal de Math. 

Pures et Appliquees, Vol. 6, 4th series (1890), p. 313. 

[12] T. E. Harris, Some Theorems on the Bemoullian Multiplicative Process , Dissertation, 

Princeton, 1947. 

[13] H. Cramer, Random Variables and Probability Distributions , Cambridge Tract 36,1937. 

[14] D. V. Widder, The Laplace Transform, Princeton University Press, 1941. 

[15] P. Fatou, “Sur une classe remarkable de series de Taylor,” Annates Sci. de VEcole 

Normale Sup. de Paris, Vol. 27, series 3 (1910), pp. 43-53. 

[16] S. LattHjs, “Sur lcs suites recurrentes non lineaires et sur les fonctions generatrices de 

ces suites,” Annales de la Fac. des Sciences de VUniv. de Toulouse, Vol. 3, series 3 
(1911), pp. 96-105. 

[17] Niels Arlev, On the Theory of Stochastic Processes and Their Applications to the Theory 

of Cosmic Radiation , G.E.C. Gads Forlag, Copenhagen, 1943. 

[18] E. C. Titchmarsh, The Theory of Functions, 2d Ed., Oxford, 1939. 

[19] S. M. Shah, “On real continuous solutions of algebraic difference equations,” Am. 

Math. Soc. Bull., Vol. 53 (1947), pp. 548-558. 



MOST POWERFUL TESTS OF COMPOSITE HYPOTHESES. I. NORMAL 

DISTRIBUTIONS 

By E. L. Lehmann and C. Stein 
University of California, Berkeley 

Summary. For testing a composite hypothesis, critical regions arc deter¬ 
mined which are most powerful against a particular alternative at a given level 
of significance. Here a region is said to have level of significance e if the proba¬ 
bility of the region under the hypothesis tested is bounded above by e. These 
problems have been considered by Neyman, Pearson and others, subject to the 
condition that the critical region be similar. In testing the hypothesis specify¬ 
ing the value of the variance of a normal distribution with unknown mean against 
an alternative with larger variance, and in some other problems, the best similar 
region is also most powerful in the sense of this paper. However, in the analo¬ 
gous problem when the variance under the alternative hypothesis is less than 
that under the hypothesis tested, in the case of Students hypothesis when the 
level of significance is less than and in some other cases, the best similar region 
is not most powerful in the sense of this paper. There exist most powerful tests 
which are quite good against certain alternatives in some cases where no proper 
similar region exists. These results indicate that in some practical cases the 
standard test is not best if the class of alternatives is sufficiently restricted. 

1. Introduction. The problem to be discussed in this paper is that of testing 
a composite hypothesis against a simple alternative. More specifically let if = 
{/) be a family of probability density functions defined over a Euclidean space R n 
and let g be a probability density function not in if. We wish to test the hypoth¬ 
esis H 0 that the random variable X — (Xi , • • • , X n ) is distributed according 
to a density / of if against the alternative Hi that X is distributed according to 
g. By a test we mean a region of rejection, w in R n . 

Neyman and Pearson, in the fundamental paper [1] which laid the groundwork 
of the theory of optimum tests, restricted their considerations to similar regions. 
They considered a region (set) w to be optimum for the given level of significance 
c if it maximizes the power 

( 1 ) J g(x ) dx 

subject to the restriction 

(2) f f(x) dx = e for all / in C S. 

J%0 

As Neyman, Wald and others have pointed out, it is more natural to replace 
the condition of similarity (2) by the weaker restriction 

(3) f f(x) dx < € for all / in if. 

Jw 


495 



496 


E. L. LEHMANN AND C. STEIN 


A region w maximizing (1) subject to (3) is called most powerful against the alter¬ 
native g at the level of significance. Here and throughout the paper, all func¬ 
tions and sets are assumed to be Borel measurable. 

In the present paper we shall consider certain composite hypotheses, and derive 
tests for them which are most powerful against a simple alternative. For the 
cases in which these tests coincide with the standard similar regions it will thus 
be established that no further increase in power is possible with tests of fixed 
sample sizes. In the more usual situation where the most powerful test depends 
strongly on the specific alternative chosen, no such absolute justification of the 
standard test is possible. In these cases, any justification must take account 
of the fact that it is desired to obtain good power against a large class of alterna¬ 
tives. This can be done, for instance, by using Wald’s definition of a most strin¬ 
gent test [2] or his concept of minimizing the maximum risk. 1 If, on the other 
hand, the class of alternatives is sufficiently restricted, the results of the present 
paper indicate that for small samples there may exist a test which is appreciably 
better than the standard test. 

Frequently the probability of an error of the first kind is an analytic function 
of a nuisance parameter for every choice of critical region. Hence, if it is known 
that some nuisance parameter 0 lies, say in a certain finite interval /, then any 
test which is similar for 0 in I will be similar for all 0. Consequently, the knowl¬ 
edge concerning 0 cannot be used to find a more powerful test. On the other 
hand, as is indicated at the end of section 5, restrictions of the nuisance parame¬ 
ters may, for small samples, lead to considerably more powerful tests if the con¬ 
dition of similarity is replaced by the weaker condition (3). 

There is one class of problems to which it may be desirable to apply the method 
of the present paper regardless of sample size; namely, if no similar region exists. 
Suppose, for instance, that Xi , * • • , X n are known to be normally and inde¬ 
pendently distributed, Xi having unknown mean and variance $, and a] for i = 
1, • • • , n. For testing the hypothesis 

Ho: <n * 1, (i = 1, • • • , n) 

no similar region exists, while it is easy to see that against any simple alternative 

Hi : <n = an <1, £» = £ii, 

there exists a test which satisfies condition (3) and which has good power against 
Hi provided the an are sufficiently small. 

The present first part of this paper is restricted to hypotheses concerning 
normal distributions. It is intended to extend the considerations to exponential 

1 In an unpublished paper, it is shown by G. Hunt and C. Stein that the traditional test 
is most stringent in several cases, including the (univariate) linear hypothesis and the 
hypothesis specifying the ratio of the variances of two normal distributions. These results 
can be extended to analogous problems for distributions other than the normal, and similar 
results can be proved regarding minimization of the maximum risk if the weight function 
has a certain type of symmetry. 



COMPOSITE HYPOTHESES 


497 


and rectangular distributions, to consider non-parametric problems and pos¬ 
sibly also more complicated problems connected with normal distributions, in 
later parts of the paper. 


2. Sufficient conditions for a most powerful test. The method which will be 
used in this paper to obtain most powerful tests is an adaptation of the funda¬ 
mental lemma of Neyman and Pearson [1], At the same time it is essentially 
a special case of much more general results of Wald [3, 4], although theexact 
conditions of Wald’s investigation are not satisfied in most of our problems. 

Let h and g be two functions defined over R n , let k be a constant and let w 
be a region in R n such that 


(4) 


g{x) > k h(x ) in w; 
g(x) < k h(x) in R n — w. 


Then if w' is such that 


(5) 


J h(x) dx < J h(x) dx , 


it follows as in the fundamental lemma where in (5) equality is assumed instead 
of inequality, that 


( 6 ) 



dx < 



dx. 


Throughout the present paper we shall be concerned with the special case in 
which ff is an s-parameter family. We may denote the members of fjF by f g and 
we shall obtain all members of ff as 0 ranges over a set w in an s-dimensional Eu¬ 
clidean space. In the theorem which we shall now state, we shall be concerned 
with point functions X defined over w. We shall assume that X = eg. where c 
is a positive constant and n a cumulative distribution function. 2 Also we sup¬ 
pose that/*(x) is a measurable function of x and 0 jointly. However, the theo¬ 
rem is also valid if « is an abstract space and X a (finite) non-negative additive 
set function (measure) over w. Such more general interpretation may be re¬ 
quired when applying the theory to non-parametric problems. 

Theorem 1. Let Ho be the hypothesis that the random variable X is distributed 
according to a density junction f» with 0 in w, and let Hi denote the alternative that X 
is distributed according to a density g. Let X be a function defined over a> and such 
that 


(7) 


X = C/i, 


* The introduction of the distribution m is simply a mathematical device and does not 
imply that 0 is a random variable (see Wald [16] p. 282). 



498 


E. L. LEHMANN AND C. STEIN 


where c is a positive constant and n a cumulative distribution function. Let k be a 
constant and let w be a region in R n such that 


( 8 ) 


g(x) >kf f 9 (z) d\(6) 

Jto 


in w ; 


g(x) < k f fi(x) d\(6) in R„ — w. 

Jto 

Suppose that w is of level of significance e for testing H 0 against Hi , that is that 

(9) f fe(x) dx < e for all 6 in w, 

j w 

and suppose that the subset of w for which 

(10) f f 9 (x) dx < € 

Jw 


has measure zero. Then w is most powerful for testing H 0 against Hi at level of 
significance e. 

Proof. Without loss of generality we shall assume c = 1. Let w ' be any 
test of level of significance e. Then 

(11) [ f e (x) dx < e for all d in w, 

J to* 

and because of (7) 

(12) f |J r f,(x) da:| d\(e) < e f d\(e) = «. 

Since X is of bounded variation we may interchange the order of integration in 

(12) and obtain 

(13) [ h(x) dx < 6, 

Ju>' 

where 

(14) h{x) = f ft(x) d\{6). 

Ju 

From (9) and the condition surrounding (10) it follows that 

(15) /.{/. fe(x) dX(0) = 
and therefore that 

(16) f h(x) dx = c. 

0 

Thus w and w' satisfy conditions (4) and (5), and hence also (6) which completes 
the proof. 



COMPOSITE HYPOTHESES 


499 


It is useful to notice that, the assumptions of theorem 1 will be satisfied pro¬ 
vided 

/ fe(x) dx 

J w 

attains its maximum e at all points of increase of X, and therefore in particular 
whenever w is a similar region of size c. 

We shall in many problems exhibit a function X which satisfies the conditions 
of theorem 1 without giving the reasons which led us to this function. However 
the following comments concerning the tentative process that we used, may be 
helpful. One may first examine the known most powerful similar region. If 
there exists a cumulative distribution function X such that (8) is the most power¬ 
ful similar region, the problem is solved. If the most powerful similar region 
cannot even be approximated by (8) with a sequence of X’s, it is reasonable to 
conclude that the most powerful test is not similar. Because the probability 
(under the null hypothesis) of any test is in all the problems considered here an 
analytic function of the parameter, this implies that the probability (under the 
null hypothesis) of the most powerful test attains its maximum at an at most 
denumerable (in some cases finite) set of points. In all the cases of this kind 
which we considered in the present part I, it was then possible to prove the 
existence of a function X with a single point of increase, which satisfied the condi¬ 
tions of theorem 1. 

A theorem analogous to theorem 1 holds for most powerful similar regions. 
Let Ho and Hi be as before and let X be a function of bounded variation not 
necessarily non-decreasing. Let w be a region in R n such that 

g(x) > k I f 9 (x) d\(6) in w; 

J u 

(17) 

ff(x) < k I f,(x) d\(8) in R n — w. 

Ju 

Let w be a similar region of level of significance 6 for testing H 0 against Hi , that 
is, let 

(18) f fe(x) dx — e for all 0 in w; 

then w is a most powerful similar region for testing Ho against Hi . 

For all the problems considered in this paper we shall prove the existence of 
functions X satisfying the conditions of theorem 1, but we have not investigated 
the corresponding existence problem in general. On the other hand one verifies 
easily that for many of the cases treated here in which the most powerful test is 
not similar, the method for obtaining most powerful similar regions does not 
apply. However, for all the problems considered in the present paper the most 
powerful similar tests can be obtained easily by other methods [1, 5, 6, 7, 8]. 
For most of the problems the corresponding derivations have been carried out 
in the literature. 



500 


E. L. LEHMANN AND C. STEIN 


Although we restrict ourselves in the present paper to the problem of maximiz¬ 
ing the power at a single alternative, theorem 1 clearly also applies to the more 
general problem of maximizing the average power over surfaces in a space of 
alternatives. Such problems have been considered from the point of view of 
similar regions by Wald, Hsu and others [9,10,11]. 


3. Testing the values of one or several variances. Let Xi , • • •, X n be a sample 
from a normal population with mean £ and variance <r 2 , both unknown. We 
want to test the hypothesis Ho that <r = <r 0 against the simple alternative that 
v = <ri, £ = £i. We shall show that the most powerful test for Ho against Hi 
is 

(19) 2(.r, — £i) 2 < k when <r A < <r 0 ; 

(20) 2(x t — x) 2 > c when <ri > <r 0 , 


where k and c are determined by the level of significance. Thus the best similar 
region is most powerful if the variance under the alternative is greater than that 
tinder the null hypothesis, while the most powerful tests against the other alter¬ 
natives are not similar. That the region 2(z t — xf > c (< c') is most powerful 
of all similar regions against u\ > cr 0 (Vi < a 0 ) was shown by Neyman and Pear¬ 
son 11]. 

We consider first the case < a 0 , and apply theorem 1 with X a stepfunction 
having a single jump at £i, that is, 


( 21 ) 


X(t) = 


0 if £ < £i; 
1 if £ > £i. 


The region w given by (8) thus becomes 


( 22 ) 

which is equivalent to 


exp 

exp 


*<* - £>) 2 

~ 1 -1 ^ K > 


(23) 


S(ar. - Q* < k, 


since <ri < <r 0 . The size of the region (23), that is, its probability under the null 
hypothesis is a function of £ and clearly attains its maximum when £ ~ £i. • Thus 
all conditions of theorem 1 are satisfied provided we choose k so that the maxi¬ 
mum size of (23) equals e. 

Before considering the case ai > <r Q we state for later reference the following 1 : 
Lemma 1. If <ri > <ro there exists an absolutely continuous non-decreasing func¬ 
tion X of bounded variation such that 

(24) ^ - = C ex P (t - £i) 2 J . 



COMPOSITE HYPOTHESES 


501 


This follows immediately from the well known representation of exp -i' 

as a Laplace transform by applying a translation, and is easily verified directly 
by substituting 

(26) X'(«-exp (£-!,)’]. 

Now let <ti > (To and n > 1. The region w given by (8) can be expressed in the 
form 



By lemma 1 there exists an absolutely continuous function X for which the second 
factor is constant. For this X (26) is equivalent to 

(27) 2{xi — x) 2 > c, 


and since this is a similar region, the conditions of theorem 1 are satisfied pro¬ 
vided c is chosen so as to give the correct level of significance. 

We next consider the problem in which the random variables X, (t = 1, • • • , n) 
are independently normally distributed with unknown means {. and unknown 
variances a \. We wish to test the hypothesis Ho : <r< = <r»o for i = 1, • • • , n 
against the alternative Hi : = a t i , f, = . Feller [12] showed that there 

exist no similar regions for this problem. However, as we shall show now, when 
the critical regions are not required to be similar, non-trivial tests against Hi 
do exist provided an < <r l0 for at least one value of i . 

Let us assume without loss of generality that an < an for i = 1, • • • , w; 
an > an for i = m + 1, • • • , n where n — m may be zero but where for the 

n 

moment we shall assume m > 0. With X(& , • • • , {„) = nx *({,), the region 

t-i 

(8) becomes 



For \i(i = 1 , • • • , m) we take step functions with a single jump at {a , while 
for the remaining X’s we choose the absolutely continuous functions which make 



502 


E. L. LEHMANN AND C. STEIN 


the second factor constant and whose existence is guaranteed by lemma 1. The 
region (28) thus reduces to 

(29) 2 ('2 2 "^ (xi — fti) 2 < c. 

t-i Vii cr io/ 

Since the probability of the region (29) is independent of £ m +i, • • • , ft» and with 
varying ft ,*•*,{« takes on its maximum when ft = fti it follows from theorem 
1 that this region is most powerful for testing H 0 against Hi . 

We still have to consider the case m = 0, that is, the case in which an > a iQ 
for all i. To treat this problem we adjoin to the variables Xi , • • • , X n a random 
variable Y uniformly distributed between 0 and 1, that is, essentially a table of 
random numbers. In the space of n + 1 random variables we determine a region 

n 

w according to (8), letting \(ft , • • • , ft) = IX A,(ft) and choosing the A’s so 

»«-i 

as to make the left hand side of (8) equal to the right hand side. This is possible 
by lemma 1 and with this choice of the X’s the inequalities (9) become 

k > k in w ; 

(30) 

k < k in R n +i — w, 

and hence they impose no restrictions on w. Thus any similar region of the cor¬ 
rect size will satisfy the conditions of theorem 1. It follows that the region 

(31) w: 0 < y < «, 

being a similar region of size «, is most powerful. This result means that we do 
not use the observations Xi , • ■ • , x n at all but consult a table of random num¬ 
bers. 

The situation just described occurs in other problems to which the same 
method of proof can be applied. It is therefore convenient for later reference to 
formulate the following 

Theorem 2. Let H 0 be the hypothesis that the random variable X is distributed 
according to a probability density function f e with 6 in w, and let Hi denote the alter¬ 
native that X is distributed according to the density function g. Let Y be a random 
variable known to be uniformly distributed over the interval [0, 1], If there exists a 
real valued function \ satisfying (7) for which 

(32) g(x) - k [f 9 (x)dm, 

j u 

then the critical region 0 < y < c is most powerful for testing H 0 against Hi at level 
of significance €. 

4. Testing equality of variances and the value of the circular serial correlation 
coefficient. For each i = 1, • • • , m let Xij(j = 1, • • • , n<) be a sample from a 
normal distribution with E(X {j ) = ft and E(X %j - ft) 2 = <r< . We are con- 



COMPOSITE HYPOTHESES 


503 


cerned with the hypothesis 7/ 0 that <ri = <r 2 = • • = <r m , where first we shall 

assume the f’s to be known, so that without loss of generality we may assume 
them equal to 0. The alternative hypothesis specifies a % = <r tl , i = 1 • • • m. 
Let a 2 denote the unknown common variance under // 0 and let X(<r) be a step 
function with a single jump at a point <t 0 to be determined later. With 

k = H (~ ) , the test (8) takes on the form 

*-i V*i/ 


(33) 


or equivalently 
(34) 



> 1 , 



1 2 
T 
(Til 


> 00 • 


Since the function on the left hand side is homogeneous of degree 0 in the z’s, 
this is a similar region and the conditions of theorem 1 are therefore satisfied 
provided the region has the correct size. This can be achieved for any level 
of significance e by proper choice of <rl . 

As stated earlier, the conditions of theorem 1 imply that the size of the critical 
region is equal to « at all points of increase of X. As a consequence, if the size 
equals e at only a finite number of points of o>, X must be a step function. Also 
if each point of a certain interval is a point of increase of X, the critical region 
must be similar over that interval (and, if the functions involved are analytic, 
the region must be similar over w). However, the last problem shows that the 
converse of neither of these two statements is correct. For the region (34) 
is a similar region although the corresponding X has only a single point of increase. 

Next we consider the hypothesis of equality of variances without assuming the 
means to be known. For the case m = 2 the most powerful similar region was 
obtained by Neyman and Pearson [1]. We assume first that n t > 1 for all z, 

m 

and we take X(<r, fc , • • • , f m ) = Xo(<r)IIX,(£*), as before a step func- 

*-i 

tion with a single jump at a point <r 0 to be determined later. Suppose now that 
o’o > 0ii for » = !,•••,«; <ro < for i = s + 1, • • • , m, <ru < <m < • • ■ 


where 0 < s < m and s depends on <r 0 . Then define 


(35) 


X.({.) - 


0 if < {,i ; 
1 if {, > ia 


for z = 1, • • • , s and use lemma 1 for i = s + 1, ■ • • , m. 

For proper choice of k the critical region will then be determined by the in¬ 
equality 



504 


E. L. LEHMANN AND C. STEIN 


(36) 




Gr„ - x ,) 2 




~ > 0. 


The probability of this region computed under H 0 , is independent of £„ + i, • • , 
f n and for any a attains its maximum when £ t = £»i (i = 1, ■ • • , s). Since the 
probability of the region is independent of a when £, = £ a for i = 1 , • • • , $> 
the conditions of theorem 1 are again established. That for £,- = £,i the size of 
(36) goes continuously from 0 to 1 with decreasing a 0 is easily checked since at 
the only doubtful points <j 0 = <rn (where the value of s changes), the correspond¬ 
ing coefficient — 2 ' passes through 0. 

ao a, 1 

We still have to consider the case that some of the n t arc equal to 1 . If n, = 1 
for some i < s there is no change whatever, while if n t = 1 for some i > s, 
the corresponding term in (36) vanishes. It follows easily that if ru > 1 for at 
least one value of i > 1 the solution (36) is valid. On the other hand, if n x = 1 
for all i > 1, we can apply theorem 2 by taking a 0 = an , Xi(fi) as a step function 
with a single jump at £n and the remaining \,(£,) according to lemma 1 . It thus 
follows that for this problem no non-trivial test exists. 

The following problem can be reduced to the hypothesis of equality of vari¬ 
ances with means assumed known: Under the null hypothesis Xi, • • • , X n have 

a joint multivariate normal distribution with density C exp — ^.2 ^ a *aXiX ; l 


where the a '\s are known and where a is an unknown scale factor. Under Hi 
the X's have a joint multivariate normal distribution with density C f exp 

A number of hypotheses specifying the value of one or several 

correlation coefficients have this form. The most powerful test of H 0 against 
Hi is given by 


(36) faTx 

2* d ijX \Xj 

as is easily shown by applying a non-singular linear transformation which re¬ 
duces ZbijXiXj to diagonal form and 2a,ijXiXj to a sum of squares, or by applying 
directly the method of proof of the earlier problem. 

A corresponding reduction when the X’s have a common but unknown mean is 
usually impossible. One problem of this kind for which the solution is simple is 
the hypothesis specifying the value of a serial correlation coefficient in a circular 
population. The most powerful similar region for testing this hypothesis was 
obtained in [7]. Consider the probability density function 


C exp 



fa - f) - a(*w - f) 



(x,+i = Xi), | s | < 1 , 


(37) 



COMPOSITE HYPOTHESES 


505 


and let Ho specify 5 = 5 0 while Hi assigns to the parameters the values ai, { 1 , 5 i. 
Then the most powerful test of H 0 against Hi is 


(38) 


*)(*.+i - jO > u 
2(x t — x) 2 


if h > bo ; 


S(jr f - fc)(-r lW - fi) 

3(*. ~ £i) 2 


< /o' 


if 5i <C 60 . 


We shall omit the proof of this result, since the method is the same as in the other 
problems considered in this section. 


5. Student’s hypothesis and some generalizations. As the principal result of 
the present section we shall prove that for testing Student’s hypothesis against a 
simple alternative the most powerful test is a non-similar region of the form 

(39) 2(X i - v ) 2 < k, 


if the level of significance e is less than or equal to Here 77 and k depend on 
c and on the alternative, and they will not be determined explicitly. It will be 
shown also that if e is greater than or equal to §, Student’s test is most powerful. 
These results will be extended rather easily to the general univariate linear 
hj'pothesis. The corresponding investigation for similar regions was carried 
through for Student’s hypothesis by Neyman and Pearson [ 1 ] while the extension 
to a general linear hypothesis is contained in a paper by Hsu 113]. 

The proof of the main result mentioned above is rather lengthy. We shall 
begin by proving two lemmas. 

Lemma 2. Let Y \, • • • , Y n be n independent random variables , normally dis¬ 
tributed with 0 mean and unit variance , and let 


(40) 


P(ajc) = P 2(1, - a) 2 < (n - k)a‘ 


<p(k) = sup P(a, k) for 0 < k < n } 0 < a. 

a 


Then for each k there exists a(k) such that 

(41) P(a(k),k) = v(k). 

Proof. If Z t = Y l /a, (i = 1, • • • , n) the Z’s are independently normally 
distributed with zero mean and variance 1/a 2 and (40) may be written as 

(42) P(a } k ) = P{2 (Z t - <n- *rj. 

Hence it is seen that for any k, P(a } k) tends to zero as a tends to either zero or 
infinity. This proves the lemma since for any k, P(a } k) is a continuous fimetion 
of a. 

Lemma 3. Given any «, 0 < € < \ there exists k(e) between zero and n such that 
*(*«) = «• 



506 


E. L. LEHMANN AND C. STEIN 


Proof. The proof will be given in a number of steps. 

(i) v(k) —» } as A; —* 0. 

Clearly P(a , k) never exceeds The result will therefore follow if we exhibit 
a sequence <4 such that P(c 4 , k) —* \ as k —> 0. Let <4 =* 1 / y/k. Then 

(43) P(a*, k) = P{y/k 2 F’ - 22 F, + Vit < 0 }. 

The right hand side is a continuous function of k and therefore tends to 

(44) P{2 Yi > 0 ) * J, 
as k tends to zero. 

(ii) <p(k) —> 0 as k —> n. 

Consider P(a, A;) as in (42). Written as an integral of the probability density 
of the Z’s, the region of integration is independent of a and its volume tends to 
0 as A; tends to n. On the other hand the probability density depends on a 
but is uniformly bounded over the region of integration if k > 0 , and hence the 
result follows. 

(iii) If 0 < ko , P(a, k) tends to zero uniformly for k in the interval ko < k < n 
as a tends to zero or infinity. 

This follows from the fact that 0 < P(a, k) < P(a, k 0 ) since P(a, h) tends to 
0 as a tends to zero or infinity. 

(iv) Given ko and k x there exist numbers ao and a x with 0 < a 0 < <4 < » 
such that 0<ko<k<ki<n implies Oo < a(k) < a x . 

If this were not true there would exist a sequence k {t) with k 0 < k {i) < k x and 
a(k {i) ) tending to infinity or zero. Then <p(a(k {l) )) would tend to zero by (iii). 
On the other hand consider P(l, h) for k 0 < k < k x . This is a continuous non¬ 
vanishing function of k and hence attains its lower bound m for some kinko < 
k < h . Therefore m is positive and we have a contradiction. 

(v) Given any ko , k x with 0 < k 0 < k x < n , <p(k) is continuous on the inter¬ 
val [ko , hi]. 

To see this, select ao and a x in accordance with (iv). Then P(o, k) is uniformly 
continuous in the rectangle ao < a < a x , ko < k < k \. Given y > 0 let 5 be 
such that | k' — k H | <5 implies | P(a, k') — P(a, k ") | < 17 . Then <p(k f ) > 
P(a(k "), k') > P(a(k ,f ) ) k") — tf = <p(k") — rj , and by symmetry <p(k") > 
<p(k') — 77 , which establishes the continuity of <p. 

The proof of the lemma is now immediate. For let 0 < € < $. It follows 
from (i) and (ii) that there exist k 0 and k\ such that 

<p(ko) < c/2, <p(Jc x ) > € + id - c), 

and hence by (v) there exists k(c) for which <p(k(e)) — c. 

Let us now consider Student’s hypothesis. The random variables X x , • • • , X n 
are a sample from a normal distribution which under Ho has mean 0 and un¬ 
known variance <r 2 , while under H x the mean is £1 and the variance cr\. Without 
loss of generality we shall assume £1 > 0. Applying theorem 1 with X a step- 
function having a single jump at a point <jo > <r x to be determined later, we ob¬ 
tain the critical region in the form 



COMPOSITE HYPOTHESES 


507 


(45) -i)sx 2 , -2^ 2X, < c. 

\o i (T 0 / o i 

Let Y % = XJo so that under Ho the Y *s are distributed with zero mean and unit 
variance. Then (45) becomes 


(46) 


2F? - 2 


o( 1 — o\/o\) 


< 


c 


which may be written as 


(47) S(K. - a) 2 < (w - fc)a 2 , 

where 


(48) 


a 


_|i__ 

*(1 - o\/ol )’ 



As o varies from 0 to °o, a goes from qo to 0. Let P(a, fc), #>(&) and a(k) be 
defined as in lemma 2. Given the level of significance c (0 < e < £), let k* 
and a* be determined according to lemma 2 and 3 so that 

(49) <p(k*) = € and P(a*, fc*) = *(&*). 


We now select o Q > and c so that 


( 50 ) 


(1 ~ C\/oo)oo 


and k* 



We have to show that for this choice of oo and c the size of the critical region at¬ 
tains its maximum when o — o 0 and that this maximum size is c. Substituting 
from (50) we express the region (47) in the form 

(51) 2 (j.-j a *Y < (n - Jc*) ^ a*\ 


Thus the probability of the region is 



As o varies, (52) attains its maximum when -a* = a(k*) — a*, that is, when 

o 

<r — ao and the maximum value of (52) is <p(k*) = €. 

This derivation is valid even when n = 1, i.e., when the hypothesis £ = 0 is 
to be tested by observing only a single random variable X , known to be nor¬ 
mally distributed but whose mean £ and variance are unknown. For this prob¬ 
lem no similar region exists. However, critical regions of the form 0 < £i — a < 
x < £i + b will give any level of significance < £ for proper choice of a and 5, 
while the power of such regions will tend to 1 as o\ tends to 0. Therefore, the 
power of the most powerful test will be close to 1 if o\ is sufficiently small. 



508 


E. L. LEHMANN AND C. STEIN 


Having completed the discussion of the case € < ? let us next suppose that 
t > \. We shall need the following 

Lemma 4. Let c and a x be positive constants. Then there exists a function f 
such that f(a) = 0 when a < a x and such that for aU w > 0 


(53) 


f e aw f(a) da — 
Jo 


ke 


C\J w 


This follows from the well known representation of e cv/tc as a Laplace trans¬ 
form by applying a translation. (53) can be checked directly by substituting 

-(c2/4(a-a!)) 


(54) 


1(a) « 


ce 


for a > a x . 


(a - ai) 3/2 

Applying theorem 1 to Student’s hypothesis, where again we shall assume & 
to be positive, for proper choice of k we obtain from (9) 


(55) 


»p[-s: + H 


d\(<r) 


It follows from lemma 4 that for any positive c there exists a non-decreasing 
function X of bounded variation with X(cr) constant for a > <r x , such that 

(56) jf exp £ — i d\(<r) = exp SX 2 - c • 


For this choice of X, (55) reduces to 


(57) 



> exp [ c \/2.rf], 


and hence to 


(58) 


2x, 

Vsx 2 - 


C . 


This is a similar region and therefore most powerful for testing Student’s hy¬ 
pothesis against H x . By adjusting c, the size of the region can be made equal 
to any € > 

The argument for t > £ must be modified slightly in the case n = 1, that is, 
when we want to test Student’s hypothesis on the basis of a single observation. 
Let us adjoin to the variable X a random variable Y known to be uniformly 
distributed over the interval [0, ]]. Using the same X and k as before, (58) 
becomes 


(59) 


x 

x 



COMPOSITE HYPOTHESES 


509 


For c' = — 1 the critical region includes all points (x, y) for which x is positive 
while (59) places no restriction on which of the remaining points to include in 
the critical region. The similar region 

(GO) x > 0; x < 0, 0 < y < 2(« - J) 


therefore satisfies all conditions of theorem 1 and hence is most powerful 

In extending these results to the general linear hypothesis, we shall assume the 
hypothesis reduced to canonical form [14, 15]. We shall therefore assume that 
Xi , • • • , X n are normally distributed with common variance which is unknown 
under Ho and has the value o*i 2 under Hi . Furthermore, under H 0 , E(Xi) = 0 
for i = 1, ••*,$, s + 1, •• • , E(Xi) unknown for i — m + 1, • • • , n while 
under H\ E(X t ) = 0 for i — s + 1, • • • , m;E(Xi ) = for the remaining values 
of i. 

For c < \ we shall consider critical regions of the form 


(61) 


exp/— ~2 Z) (x< - £»i) 2 + Z) x\ + Z (x» - 

I Z<Tl 1 _ l-t+l _ *-m+l _JJ ^ 

exp/- A 22*? + 22 x] + 22 (*< - f./|) 

[ »’■■•+1 t—m+l JJ 


k f 


which are obtained from (8) by substituting for X a step-function with a single 
jump at the parameter point (<r t , £ m+ i, i ,•••,£«. i). Making an orthonormal 

Z; f llL x t 

transformation from x., • • • , x, to yi , • • • , y\ such that y x - and 

Vzfti 

letting y. = x< for i = s + 1 , • • • , m\ y, = x, - £,i for i = m + 1 , ■ • • , n, 
(Gl) reduces to 


(62) 



> c. 


For <ro > o\ we can rewrite (62) as 


(63) 


ft 


22 

t—m+l 


y\ < 


l 


2 2 
O'! — <70 



2j/i Vsfr 



and we see that under H 0 for any <7 the size of this region considered as a function 
of the unknown means of Y m + 1 , • • • , Y n takes on its maximum when these 
means are zero, i.e. when £»• = £,i for i = m + l, • • • , n. For these maximizing 
values of the means the existence of a suitable <7 0 and c follows from the corre¬ 
sponding result in connection with Student's hypothesis. 

Thus the most powerful test for testing H 0 against Hi at level of significance 
c = 2 has the form 




Z (Xf - til) 2 < C. 

t‘-»m+1 


(64) 



510 


E. L. LEHMANN AND C. STEIN 


It is interesting that the variables Xi(i = m + 1, • • • , n) which may be dis¬ 
carded when considerations are restricted to similar regions [18], do contribute 
to the power when similarity is not required. The same phenomenon also 
occurs in certain problems considered earlier in this paper. 

For the case € > £, let us take 

(66) \(<r, £„+,,•••, £„) = X(«r) fl X,(£. | a). x 

We shall select X(<r) such that \{a) is constant when a > <n . Hence it is enough 
to define X»(f* | a) for <r < <r x . For any a < <ri there exists by lemma 1 a func¬ 
tion X,(£i | <r) such that 

(66) exp (xi - £<) 2 JdX.({j | <r) = k exp j- ^ (a-, - £„)*|. 


For this choice of the X,, (9) becomes 



Next we chose X(a) according to lemma 4 such that 


(6S) l exp [- A. Erf] ~ M«) = exp [- ^ 

thus, by proper choice of k\ reducing (67) to 

» 

£tl x » 

(69) -> " 

Z) a -1 

The probability of this region under H 0 is independent of £»,+i ,•••,£» and <r, 
and hence (69) is most powerful for testing H 0 against H x . 

Let us return once more to the problem of testing Student’s hypothesis against 
a simple alternative £ = {i, <r = 1 and let us assume as known that <r < 1. No 
use can be made of this knowledge if consideration is restricted to similar regions. 
For the probability of first kind error is an analytic function of <r, and conse¬ 
quently, if a test is similar with respect to all values of <r which are < 1, it is simi¬ 
lar with respect to all values of a. Let us now consider this problem without the 
restriction of similarity. If « > J, the knowledge concerning <r does not enable 
us to find a test which is more powerful than that given by (58), since the func¬ 
tion X(<r) on which (58) was based had all its points of increase for <r < 1. 

On the other hand we may expect improvement for € < J since the most 
powerful test in this case was based on a function X with a single point of increase 



COMPOSITE HYPOTHESES 


511 


<r 0 > 1 which is no longer admitted as a possible value of a. If, instead, we take 
for X the step function with a single jump at <r = 1 we obtain the critical region 

(70) gPLli 

exp [ — i 2-r *tJ 

which is equivalent to 

(71) x > c. 

Here c > 0 since e < and therefore, when £ = 0 the probability of (71) is an 
increasing function of <r and hence takes on its maximum at a = 1. It follows 
from theorem 1 that (71) is most powerful under the conditions stated. 

In the opposite problem in which it is known that a > 1, the situation is 
reversed. For e < $ no improvement over (45) is possible while for e > \ we 
can use for X the step function with a single step at <r — 1 thus obtaining the 
critical region (70) but this time with c < 0. When £ = 0 the probability of 
this region is a decreasing function of <r and it follows that (70) is most powerful 
in this case. 

Similar remarks apply to other problems. We mention as one further ex¬ 
ample a modification of the Behrens-Fisher problem. Let Xi , • • • , X„ and 
Yi , • • • , Y m be independently normally distributed, the X’s with mean £ and 
variance <r 2 , the F’s with mean y and variance r 2 , all four parameters being un¬ 
known. We wish to test, at level of significance e < the hypothesis £ = y 
against the simple alternative £ = £ 1,77 = 171,(7 = 1,7 = 1 , where £1 4 = yi and 
we assume it known that <r < 1, 7 < 1. Basing the test on a step function X 

with a single jump ater = 1, t = 1,£ = ^ 1 ”^ mri1 we obtain for w the region 

m -+■ n 

_ exp [ — jJC (3 ~ fa) 2 ~ 2 Z)( y» - *?i)_ 2 ] _ 

(ra > «p [ -1 z(*. - * - i z («. - 4^'j] ■ 

which is equivalent to 

(74) y - x >c (c > 0), 

if we assume, as we may without loss of generality, that 171 > £ 1 . When 77 = 

2 2 

£2 , Y — X is normally distributed with zero mean and variance - + . There- 

2 2 

fore the probability of (74) is an increasing function of ^ ^ and hence attains 

its maximum when <7 = 7 = 1 . It follows from theorem 1 that the region 
(74) is most powerful for the problem under consideration. 

6. Admissibility. The general problem to be considered in this paper has 
been formulated in section 1: To obtain a region w 




512 


E. L. LEHMANN AND C. STEIN 


(75) maximizing 
subject to the restriction 

(76) f f 9 (x) dx < e 

Jw 

Since for any particular such problem there may exist several essentially different 
regions satisfying these conditions, it may happen that there exists a region w ' 


such that 


(77) 

/ g(z ) dx = / g(x) dx, 

J tv f Jw 

and 


(78) 

J fe(x) dx < J fe(x) dx for all 0 e 


with inequality holding for some 0. Clearly w' is preferable to w. In this case, 
following the definition of Wald 14], we say that w is not admissible. We shall 
rule out this possibility for a large class of problems by proving 
Theorem 3. If w satisfies the conditions of theorem 1, and if the set of points 
x for which equality holds in (8) has measure zero, then any region satisfying (75) 
and (76) differs from w only on a set of measure zero . 

Proof. Without loss of generality we shall assume X of theorem 1 to be a 
distribution function. Then 

h(x) = f f 9 (x) d\(6) 

j u 

is a completely specified probability density function, and w is the unique*— 
up to a set of measure zero—most powerful test for testing the simple hypothesis 
Hoih against the simple alternative H\ig. Suppose now that w' satisfies (75) 
and (76). Then 

(79) f h(x) dx < c, 

j tv ’ 

and w’ is most powerful for testing Ho against Hi . It follows that w' differs 
from w at most by a null set. 

Earlier we enlarged the problem of testing by adjoining to the original random 
variable X a random variable with a known distribution. This is equivalent 
to the following modification of the original problem. Instead of defining a test 
to be a critical region (of rejection) in the space of x , we define it to be a critical 

3 One sees this easily from Ncyman and Pearson’s proof of the fundamental lemma [1], 
by using the assumption that the set of points for which equality holds in (8), has 
measure zero. 


J g(x) dx 


for all 6 e os. 



COMPOSITE HYPOTHESES 


513 


function <p (0 < <p(x) < 1) which with every point x associates a probability of 
rejection <p(x ). If x is observed, the hypothesis is rejected with probability <p(x) 
according to a table of random numbers. In the case where random numbers 
are not employed, <f> merely becomes the characteristic function of the set w. 

We shall now state a theorem which will prove admissibility for all but one of 
those problems treated in sections two to five, to which theorem 3 does not apply. 

Theorem 4. Suppose u = {0} is a subset of an s-dimensional Euclidean space , 
and that for any measurable function <p and for any set S which has positive measure 
and is contained in w 


(80) 

J <p(x)fe(x) dx = c 

for 0 e S 

implies 



(81) 

J‘pW)fe(x) dx = c 

for 0 e as. 

(Here and in all that follows whenever a region of integration is not indicated , the 
integral extends over the whole x space). Suppose further that ip is a critical function 
satisfying the conditions of theorem 1 and that the set S Q of points of increase of X 
has positive measure. Then ip is admissible. 

Proof. If tp were not admissible there would exist tp\ with 

(82) 

J <Pi(x) g(x) dx = J<fi(x)g(x) dx; 

(83) 

J<pi(x)fe(x) dx < J<p(x)f e (x) dx 

for all 0 e to; 

(84) 

J <Pi (x)fn(x) dx < J <p(x)Mx) dx 

for some 6 e w. 


The set T of points 0 for which (84) holds, differs from u at most by a null set. 
For 

(85) J[<pi(x) - <f(x)]f 9 (:r) dx = 0 for 0 e w - T, 

and if « — T had positive measure, (85) would hold for all 0 e a?. 

Let h and Hq be defined as in the proof of theorem 3. Since S has positive 
measure, it follows that 

(86) € = J (p(x)h(x) dx > J<p\{x)h(x) dx — tj, say. 

Letfrix) = min £l y <pi(x) + e — 77J. Then 

(87) J<f 2 (x)h(x) dx < e 



514 


E. L. LEHMANN AND C. STEIN 


and 

(88) j<p 2 (x)g(x) dx > J <pi(x)g(x) dx. 

But (pi is most powerful for testing Ho against Hi and we have a contradiction. 

By applying theorems 3 and 4 one can easily show for all but one of the prob¬ 
lems treated in sections three to five that the tests obtained there are admissible. 
The one exception occurs when testing equality of variances. Simplifying the 
notation, since we are now concerned with a special case, we shall assume that 
Xi{i = 1, • • • , n), Yi , • • • , Y r are independently and normally distributed, 
the X’s with mean Jo and variance <rl, Y{ with mean J, and variance a?, all para¬ 
meters being unknown. We wish to test the hypothesis of equality of variances 
against the simple alternative 

I J* “ Jil } *» — *»1 (i ~ 0, * * • , r), 

with 


*01 < <Tn < • • • < O’rl . 

We shall first consider the case n = 1, and prove admissibility of the critical 
function 


(89) 


<p(x, Vl , • • • , Vr) = « 


by' using a different distribution function for the parameters from the one used 
earlier. With some specialization of the distribution function, (8) becomes 
for our problem 


(90) 


f 1 / 
“'rs, 1 ’ 


foi) 2 — \ 23 2 (yi — ftl) 2 } 
Z i- 1 (Til ) 


/^{/ exp [- 


1 (X 


> k 


£o) d\ a (£o) 





(£,•)) 


d nM 


For any <r < <roi we select the Xi l) (J,) according to lemma 1. If we then take for 
M the uniform distribution over (* 0 i — 1, *oi) the left hand side of (90) will reduce 
to k . Admissibility of the critical function (89) then follows from theorem 4. 

That a constant critical function is not admissible in the case n > 1 is easily 
seen if one compares it for instance with the critical region 


(91) 


g - tbi 
V2(zi - x) 2 


We shall not obtain a complete family of admissible tests (cf. [4]) for the case 
n > 1 but we shall show that this problem is equivalent to the following one: To 
find a complete class of unbiased admissible tests for the hypothesis specifying 



COMPOSITE HYPOTHESES 


515 


the mean and variance of a normal distribution on the basis of a sample from 
this distribution, the class of alternatives being the totality of univariate normal 
distributions. 

Let n > 1 and let <p be any most powerful critical function for testing the 
hypothesis of equality of variances against H x . If <p corresponds to the level of 
significance 6 and if ft, denotes the power of <p , we have 

(92) (3 f (<r, a, ••• o, (i, •••,(,)< « 

for all admissible values of the arguments. It also follows from section 4 that 

(93) , 0*11 * • • , <Trl , Joi , fit , • • • , Jrl) = €. 

Consider for a moment the hypothesis Holer t = <r 0 i(i = 0, • • • , r), J 0 = Joi, J» 
unspecified for i = 1, • • • , r. It is easily seen that the maximum power for test¬ 
ing IIq against Hi is €. Therefore any most powerful test for testing H 0 against 
H\ is also most powerful for testing H 0 against Hi , and in particular this holds 
for <p. Furthermore, it follows easily from theorem 4 that for any most powerful 
test of Ho against Hi the probability of an error of the first kind must be iden¬ 
tically equal to e. Therefore 

(94) 0,(<roi, • • • , aoi, Joi, Ji, • • • , Jr) = « for all Ji, • • • , J r . 

But (94) is equivalent to the condition that <p is similar with respect to fi, • • • , 
J r , and it follows [12] that <p is a function of Xi , • • • , x n only. The problem is 
therefore reduced to that of finding all admissible critical functions (p(x i, • • • , x n ) 
satisfying 

(95) /3*(<roi, Joi) = e; , Jo) < e for all oq , Jo. 

That this problem in turn is equivalent to the one stated above is immediate when 
one considers the complementary critical functions 1 — <p. 

REFERENCES 

[1] J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical 

hypotheses ,” Roy. Soc. Phil. Trans , Ser. A, Vol. 231 (1933), p. 289. 

[2] A. Wald, “Test of statistical hypotheses concerning several parameters when the 

number of observations is large,” Am. Math. Soc Trans , Vol. 54 (1943), p. 426. 

[3] A. Wald, “Statistical decision functions which minimize the maximum risk,” Annals 

of Math., Vol. 46 (1945), p. 265. 

[4] A. Wald, “An essentially complete class of admissible decision-functions,” Annals of 

Math . Stat., Vol. 18 (1947), p. 549. 

[5] J. Neyman, “On a statistical problem arising in routine analysis and in sampling in¬ 

spection of mass production,” Annals of Math. *S tat., Vol. 12 (1941), p. 46. 

[6] H. ScheffA, “On the theory of testing composite hypotheses with one constraint,” 

Annals of Math. Stat., Vol. 13 (1942), p. 280. 

[7] E. Lehmann, “On optimum tests of composite hypotheses with one constraint,” Annals 

of Math. Stat., Vol. 18 (1947), p. 473. 

[8] E. Lehmann and H. Scheff^, “On the problem of similar regions,” Proc. Nat. Acad. 

Sci., Vol. 33 (1947), p. 382. 



516 


E. L. LEHMANN AND C. STEIN 


[9] A. Wald, “On the power function of the analysis of variance test,” Annals of Math. 
Stat., Vol. 13 (1942), p. 434. 

[101 P. L- Hsu, “On the power function of the E 3 -test and the T 2 -test,” Annals of Math. 
Stat., Vol. 16 (1945), p. 278. 

[11] H. K. Nandi, “On the average power of test criteria,” Sankhyd , Vol. 8 (1946), p. 67. 

[12] W. Feller, “Note on regions similar to the sample sapee,” Stat. Res. Memoirs, Vol. 2 

(1938), p. 117. 

[13] P. L. Hsu, “Analysis of variance from the power function standpoint,” Biomctrika, 

Vol. 32 (1941), p. 62. 

[14] S. Kolodzieczyk, “On an important class of statistical hypotheses,” Biomctrika, Vol. 

27 (1935), p. 161. 

[15] P. C. Tang, “The power function of the analysis of variance tests with tables and 

illustrations of their use,” Stat . Res. Memoirs , Vol. 2 (1938), p. 126. 

[16] A. Wald, “Foundations of a general theory of sequential decision functions,” Eco - 

nometrica , Vol. 15 (1947) p. 279. 



SYMBOLIC MATRIX DERIVATIVES 

By Paul S. Dwyer and M. S. Macphail 
University of Michigan and Queen's University 

Summary. Let X be the matrix [x mn ], t a scalar, and let dX/dt, dt/dX de¬ 
note the matrices [ bx mn /dt \, [dt/dx mn ] respectively. Let Y = [y P d be any 
matrix product involving X, X’ and independent matrices, for example V = 
AXBX'C. Consider the matrix derivatives dY/dx mn , dy pq /dX . Our purpose 
is to devise a systematic method for calculating these derivatives. Thus if 
Y = AX y we find that dY/dx mn = AJ mn , by p JbX = A'K pq , where J mn is a 
matrix of the same dimensions as X, with all elements zero except for a unit in 
the m-th row and n-th column, and K pq is similarly defined with respect to Y. 
We consider also the derivatives of sums, differences, powers, the inverse matrix 
and the function of a function, thus setting up a matrix analogue of elementary 
differential calculus. This is designed for application to statistics, and gives a 
concise and suggestive method for treating such topics as multiple regression 
and canonical correlation. 


1. Introduction. The derivative of a matrix with respect to a scalar 


( 1 ) 



is well known and commonly used. The symbolic derivative obtained by apply¬ 
ing a matrix of differential operators to a scalar 


( 2 ) 



is not in such general use though some authors give special cases. For example, 
if A is a symmetric matrix and X a column matrix, so that y = X f AX is a quad¬ 
ratic form, Fraser, Duncan and Collar [1, p. 48] write 


(3) 


d/dxi 

d/dx 2 


y - 2AX 


d/dXn 


to indicate concisely the result of differentiating y with respect to the elements 
Xi of X. 

It is to be noted that the matrix in (1) has the same dimensions (numbers 
of rows and columns) as the matrix Y , while the matrix in (2) has the dimensions 
of the matrix X. 


517 



518 


PAUL 8. DWYER AND M. S. MACPHAIL 


We present an illustration of each of these types of symbolic matrix derivatives 
in order to clarify the concepts. Thus if 

o 3 o *1 


we have 




while if y =* XnXsi — ^31^12 and 


6x 2 


-12x 

cos X 


x~ l 

x n 

X12 


S21 

x 22 


X 31 

x 32 _ 

t 


we have 

X32 X31 

-l = 0 0 . 

ax 

„ X12 Xn __ 

Suppose Y is any matrix product involving X , X' and independent matrices, 
for example, Y = AXBX'C. We may fix an element x mn of X and form the 
matrix 


or we may fix an element y pq of Y and form the matrix 

( 5 ) 

The purpose of this paper is to devise a systematic method for calculating these 
matrices, and to give various applications in the general field of statistics. 

By way of introduction we take the matrix product Y = AX where 


r— 




Xn 

X12 

a n 

0 l 2 

013 




A = 



and X = 

X21 

X22 

L 021 

022 

O23J 


_ X31 

X32 


so that 


flu Xn + ai 2 X21 + d \3 £31 du X\z + aw X22 + U13 £32 

,(hl Xn + 022 3?21 + U23 £31 O21 X12 + O22 iC22 + O23 #83 _ 



SYMBOLIC MATRIX DERIVATIVES 


519 


We have then 


dY 

dXu 

dY 

0J*21 

dY 

dX3i 


a n 01 
La 2 i Oj, 
rAw oi 
L°22 Oj, 

ron oi 

[_023 OJ, 


dY 

dxi 2 


-C 

-c 

dY = r° 

dX32 [_0 


dY 

dX22 


These six equations can be combined in the single one 



( 0 ) 


dY 

dX n m 


= A.J 


mn 


where J nn is a matrix having dimensions of X, with all elements zero except for 
a unit element in the m-th row and n-th column. Similarly we find 



" On 

0 " 


“0 

an 

II 

S* 

(In 

0 

0£/l2 

’ dX 

0 

0,12 


_ flu 

0 _ 



«13 _ 


a 21 

0 “ 


”0 

021 

dj/21 

ax 

(I 22 

0 

II 

n >, 

<T$ 

0 

022 


_ 0,23 

0 _ 


_0 

023 . 


These four equations can be combined in the single one 
(7) = A’K P , , 


where K pq is the matrix having the dimensions of Y with all elements zero except 
for a unit element in the p-th row and g-th column. 

It should be noted that the matrices on the left of (6) and (7) are matrices com¬ 


posed of the basic elements 


dypv 


dXn 


Other types of symbolic matrix derivatives could be defined and studied. We 
have selected these two main types because of their application to regression and 
correlation theory. The second type is more specifically indicated in the ap¬ 
plications but the relations between the types are such that a simultaneous treat¬ 
ment seems appropriate. 


2. Notation. Capital letters are used for matrices and small letters for 
scalars. It is understood that F, U, V, • • • are matrices whose elements are 
functions of the elements x mn of X and that A, B, • • • (unless otherwise stated) 



520 


PAUL S. DWYER AND M. S. MACPHAIL 


are matrices whose elements are not functions of x mn . In the development of 
the formulas it is understood that the differentiation is carried out with respect 
to x mn or X. The matrix function differentiated is called Y. 

We have already defined J mn as the matrix having the dimensions of X with 
all elements zero except for a unit element in the ra-th row and the n-th column, 
and we define K pq similarly with respect to Y. We now define Jnm as the matrix 
having the dimensions of X ' with all elements zero except for a unit element in 
the w-th row and the ra-th column, and we define K' qp similarly with respect 

dY 

to Y\ All the formulas we obtain for r— involve J mn or J nm while all those 

f OXmn 

lor involve K pq or K'qp . 
oX 

3. Differentiation of a constant. If Y = A — [a pg ] we have at once 


It follows that 


dXnrn dXtn 


typQ __ [~ d 
dX ldx mn _ 


[?/p«] — 0 ; 


y pq — 0, 


where the zero matrix of (8) has the dimensions of A , while that of (9) has the 
dimensions of X . 

4. Differentiation of a matrix with respect to itself. If Y = X = [x pg ] we note 
that 

dfa _ djp, = j 1 (p = m, q = n) j 
dx mn dx mn 0 (otherwise) J . 


It follows that 


dX mn dX m 


[ypo\ — Jmn ) 


-T-Ll 

L ^3/mn J 


y pq — K pq . 


5. Differentiation of the transpose of a matrix with respect to the matrix. 

Let Y — X\ so that 

y p q = Xqp . 


by,, _ dx„ Jl (q = m, p = n); 

dx mn 9x m » |o (otherwise), 



SYMBOLIC MATRIX DERIVATIVES 


521 


and we have 


\ypq\ — J nm , 


< 13 > _ 

where j' n m , K qp are defined as in section 2. 

6. Differentiation of sums and differences of matrices. If 


we have 


U + V - W = [u pq + v p 


_ dUpq | dl)pq SlVpq 

dx mn dx mn dX mn dX mn 


a f , _ a 

jr, Wpvi ~~ jr~ 

OXmn OXmti 


{.Upq “ 1 “ Vpq Wp 


[Wpgl “h 


M - 


and similarly 


dU + - dW 

dx mn dx mn dx mn 


dy P q _ dllpq dVpq 

dX “ dX ^ dX 


7. General formulas for the differentiation of a two factor matrix product. 

Suppose U is a matrix with r rows and d columns and V is a matrix with d rows 
and e columns, then 


We have at once 


Y = UV = [y pq ] = £ u pi v, q . 


X). — *>.« + L Up 

«-l dXmn «-l 


Now considering any fixed x mn it is clear that the first term on the right of (17) 

is the same as the right hand term of (16) with in place of u pt . The second 

dXmn 

term on the right of (17) is likewise the same as the right hand term of (16) with 
in place of v, q . We may then write 

dX m n 

(18) Hr = Jr y + u Hr- 

dX mn dXmn dX mn 



522 


PAUL S. DWYER AND M. S. MACPHAIL 


Also considering a fixed y vq we have 


- 5 


It is to be noted that this formula yields matrices of the proper dimensions (those 

of X) since and have the dimensions of X . These matrices, when 
dX oX 

multiplied by the scalar values v, q and u pa and summed, yield matrices of the 
desired dimensions. 


8. Some properties of matrix products involving J’s and K’ s. Before deriving 
formulas for the differentiation of products of specific factors, it seems wise to 
derive some formulas exhibiting certain relations involving the J’s and K } s. 
Consider the matrix A having c rows and d columns and the matrix X having d 
rows and e columns. Then Y = AX is a matrix with c rows and e columns, J mn 
one with d rows and e columns, J n m one with e rows and d columns, K pq one with 
c rows and e columns and K gp one with e rows and c columns. 

It is easily seen by actual multiplication that 

(20) A J mn is a c X e matrix with all its elements zero except those of its n-th column 
which are those of the m-th column of A. We omit further discussion of the dimen¬ 
sions of the matrices and assume that whenever a matrix product is written, 
the factors are comformable. Then we can show similarly that 

(21) JmnB is a matrix with all its elements zero except those of its m-th row , which 
are those of the n-th row of B . Similar statements hold if J mn is replaced by J n m 
or K vq or K qp . The rules are 

(a) When J mn (or Jnm or K pq or K qp ) is the postmultiplier, the first subscript 
indicates the column of the other matrix which is placed in the column 
indicated by the second subscript. 

(b) When J mn (or Jnm or K pq or K qp ) is the premultiplier, the second subscript 
indicates the row of the other matrix which is placed in the row indicated 
by the first subscript. 

Notice also that 

(22) A'K pq is a matrix with all elements zero except those of its q-th column , which 
are those of the p-th column of A', or the p-ih row of A. A similar result holds if 
Kpq is replaced by K qp or J mn or J f nm . 

9. Differentiation of specific two factor products. Let us start with Y = 
AX where the various matrices involved have the dimensions indicated in the 
last section. Application of (18), (8), (10) gives 

d JL = dA x + A dX 

dX mn dX mn dX mn 


(23) 


— 0 ~f* AJ mn — AJmn > 



SYMBOLIC MATRIX DERIVATIVES 


523 


while application of (19), (11) yields 

d 


(24) 


~dX 


da p 
1=1 dX 


PQ ^ ( 


+ 

«-l OA 


— XI Api 

«—1 


= dpiKiq + CLtfiKiq + • • • + ClpdKdq 

= acXe matrix with all elements zero except those of its g-th column 
which are those of the p-th row of A 

= A'K pq by (22). 

Similar treatment of Y = XB yields 


(25) 


(26) 


dY^^dX dB 

dx mn dx mn + dx mn mn ’ 

typq _ Y' dXp* 7 _ V 7 _ TT p/ 

JLj U *a — JLa t^paUiq — ° 


If we treat F = AX' in a similar fashion, we get 


(27) 

3F - AJ' 

— J nm , 

OXmn 

(28) 

while Y = X'B yields 

typq _ A 

dX - 

(29) 

dY _ , 

— * — JnmJjf 
oX mn 

(30) 

dypq _ Dif' 

ax ~ BK "- 


It is to be noted that J always has the subscripts mn , and similarly we find always 
Jnm , K pq , Kg P . We may therefore omit the subscripts on these letters. When 
we do so we shall also write 


dY dX 

d(x) lor a*„ n 


for 

ax r ax ’ 


placing brackets ( } around the matrix from which a fixed element is to be 
chosen. Thus if Y = AX, we write instead of (23) and (24) 

dY 


(23a) 

(24a) 


a(x) 


= AJ ; 


»(y) _ 


ax 


= A'X. 


The other results are summarized in lines 1-5 of Table I. 

Examination of (18) and (19) shows that the derivatives of products with 
two variable factors are obtained by adding the results obtained by holding 



524 


PAUL S. DWYER AND M. S. MACPHAIL 


each factor constant while differentiating the other. With this in mind, (23)-(30) 
can be used to obtain the derivatives of double products involving X and X'. 
Thus if Y = XX, we get 

(30 ~^ } = JX+ XJ, = KX' + X'K. 

Other double product formulas involving X and X' are given in Table I. 


TABLE I 


For- 

i 

V 1 

BY 

am 

mula 

r 

d(X) 

dX 

1 

AB 

0 

0 

2 

AX 

AJ 

A'K 

3 

XB 

JB 

KB' 

4 

AX' 

AJ' 

K’A 

5 

X'B 

J'B 

BK' 

6 

XX 

JX + XJ 

KX' + X'K 

7 

X'X 

J’X + X'J 

XK' + XK 

8 

XX' 

JX' + XJ' 

KX + K'X 

9 

X'X' 

J'X' + X'J' 

X'K' + K'X' 


dY 

The formulas for ^ ^ are written down very easily, but those for are 

dY g/yA 

not so easy to write. However the values of and in formulas 2-5 of 

Q/y\ qy 

Table I are such that the results for may be obtained from those for . . 

oX d\X) 


with the use of a few simple rules. They are 

(a) Each J becomes K and each J f becomes K r . 

(b) The pre (or post) multiplier of J becomes its transpose. 

(c) The pre (or post) multiplier of J' becomes a post (or pre) multiplier of K'. 
These rules are immediately applicable to the double products. Thus when 
Y - X'X we have 


; (J > = •/'* + 

and so 

d ^) = XK' + XK. 


10. Differentiation of three (or more) factor products. Products with three 
factors can be differentiated by the formulas of the last section if two adjacent 
factors are constant. Thus if Y =* ABX, we have 

dY AnT d<F>_ 


d(X) 


= ABJ, 


dX 


= B'A'K. 



SYMBOLIC MATRIX DERIVATIVES 


525 


It is not yet demonstrated that these rules are applicable to the products AXB 
and AX'B. However it can be shown by the general methods indicated earlier 
that if Y = AXB y we obtain 

<33) 5m - "*■ -S - 

while if Y — AX r B we have 

(34 > ir ■ BK ' A - 

It is now apparent that the rules of the last section apply to situations in which 
there are both pre and post multipliers. 

The general theory for two-factor products is immediately extendable. Thus 
if F = UVW with y pq = u pa v tr Wr q then the basic element is 


(35) + 

uXf MM m r (jXm, I i r 




. + y. t. , 


dW r , 


and the formulas result from treating each factor in turn as the only variable. 
For example if Y = XX'X, we have 


(36) 

and 

(37) 


= JX'X + XJ'X + XX'J, 

d{X) 

= KiX'XY + XK'X + {XX')'K 
d A 

= KX'X + XK'X + XX'K. 


The symbolic derivatives of certain triple product matrices are presented in 
Table II. 

The rules are sufficiently general to take care of matrices with more than 
three factors. Thus if Y = A'X'XB, we have 


(38) 

- d Z\ = A'J'XB + A'X'JB 

d{A) 

and 

(39) 

= XBK' A' + XAKB', 
dX 


and in the special case B = A, we get 


(40) 


dY 

d(X) 

d(Y) 

dX 


A'iJ'X + X'J)A, 
XA(K' + K)A'. 


( 41 ) 



526 


PAUL S. DWYEB AND M. S. MACPHAIL 


Similarly if Y « X'A'AX, we get 

(42) = J’A'AX + X'A'AJ, 

d(A) 

and 

(43) = A'AXK' + A'AXK. 

oA 


TABLE II 


For¬ 

mula 

Y 

BY 

S(X) 

HY) 

ax 

1 

ABC 

0 

0 

2 

ABX 

ABJ 

B'A'K 

3 

AXC 

AJC 

A'KC' 

4 

XBC 

JBC 

KC'B' 

5 

ABX' 

ABJ' 

K'AB 

6 

AX'C 

AJ'C 

CK'A 

7 

XBC 

J'BC 

BCK' 

8 

AXX 

AJX+AXJ 

A'KX' + X'A'K 

9 

XBX 

JBX + XBJ 

KX'B' 4- B'X'K 

10 

XXC 

JXC + XJC 

KC'X' + X'KC' 

11 

AXX' 

AJ'X' + AX'J' 

X'K'A 4- K'AX' 

12 

XBX' 

J'BX' + X'BJ' 

BX'K' 4* K'X'B 

13 

XXC 

J'X'C + X'J'C 

X'CK' + CK'X' 

14 

AXX 

AJ'X + AX'J 

XXA 4- XA'K 

15 

XBX 

J'BX + X'BJ 

BXX 4* B'XK 

16 

XXC 

J'XC + XJC 

XCX 4- XKC' 

17 

AXX 

AJX' + AXJ' 

A'KX 4- K'AX 

18 

XBX' 

JBX' + XBJ' 

KXB' + K'XB 

19 

XXC 

JX'C + XJ'C 

KC'X 4- CK'X 

20 

XXX 

JXX + XJX + XXJ 

KX'X' 4* X'KX' 4* X'X'K 

21 

XXX' 

JXX' + XJX' 4- XXJ' 

KXX' 4* X'KX 4- XXX 

22 

XX'X 

JX'X 4- XJ'X 4- XX'J 

KX'X 4* XXX + XX'K 

23 

X'XX 

J'XX + X'JX + X'XJ 

XXX 4* XKX' 4- X'XK 

24 

XX'X' 

JX'X' 4- XJ'X' 4- XX'J' 

KXX 4- X'XX 4- XXX' 

25 

X'XX' 

J'XX' 4- X'JX' 4- X'XJ' 

XX'X + XKX 4- XX'X 

26 

X'XX 

J'XX + X'J'X 4- X'XJ 

X'XX 4- XXX' + XXK 

27 

X'XX' 

J'X'X' + X'J'X' + X'X'J' 

X'X'X 4* X'XX' 4- XX'X' 


Finally if Y 

(44) 

(45) 


XAX'AX, we get 


JY 

a(x) 

av 

d(X) 


J AX' AX + XAJ'AX + XAX'AJ, 
KX'A'XA' + AXK'XA + A'XA'X'K. 


11. Vector results. It should be emphasized that each of the above results 
is a general result. More specific results may be obtained in case one (or more) 



SYMBOLIC MATRIX DERIVATIVES 


527 • 


of the matrices is a vector. For example if X 0 is a column matrix and 
Y = X e BX e , then Y is a scalar, so K and K' are both unity and we have from 
Table II (15) 

(46) = BX. + B’X. - (B + B')X C . 


If in addition B is symmetric, B' =* B and we have 


d(Y) 

dX 


= 2BX 


0 1 


which is the result indicated in (3). 


12. Differentiation of the inverse of X. It is possible to use implicit differen- 
dY”" 1 d(X —*) 

tiation to derive formulas for * x and — . We write l = XX~ l and get 

oX 


d(X )' 


ai 

a(x) 


0 =,/X“' + X 


dX~' 

a(X)’ 


so that 



(47) 

ax -1 

a(X) 

-X~ l JX~\ 

whence 



(48) 

a(xr l ) _ 
ax 

-( xr l YK(x-'y,. 


The formula (47) is a generalization of a known matrix differential formula 
[33.4]. 

In a similar way we derive 

(49) ^fx) = -(XT'J'(Xr\ 

(50) = -(XT'K'iXT 1 . 
oX 


13. Differentiation of a function of a function. The theory developed in the 
earlier sections is sufficiently general to be useful in differentiating a function of 
a function if the functions involve addition, subtraction, premultiplication, post¬ 
multiplication, and inverse. For example if 

(51) Y = Z'Z with Z = AX 



528 

PAUL 8, 

and since 



dZ' 


d(X) 

(52) 

d(Y) 

dX 


and thence 
(53) 


PAUL S. DWYER AND M. S. MACPHAIL 


- rA ' “ d «“>-"> 


= A’ZK’ + A'ZK. 
dX 


These results are equivalent to those of (42) and (43). 


14. Differentiation of a power of a square matrix. The values of the sym¬ 
bolic derivatives of X 2 f X 8 with respect to X are given in Tables I and II. It can 
be shown similarly that if n is a positive integer 

(54) = JX n ~ l + E zvr’— 1 + X n ~\j, 

d(X) «-i 

and this can be written as 

•j y ™ n ~ 1 

(55) "T-gra—. 

if we adopt the convention that X° is I. It follows at once that 

(56) = E X " K(Z') n "'" 1 . 

dA «-o 

It is thence possible to derive formulas for the symbolic derivatives of X~ n . 
Since X~ n X n = I, we have 

(57) x n + z~ B [e = o, 

SO 

(58) l^x) = -z - * [E zvz--'] z _ \ 

and 

(59) = -Z - ” £g (Z')*Z(Z') B-, ~ 1 ] Z - ". 


15. Applications. We consider the classical theory of least squares, a matrix 
presentation of which is available in [2]. Suppose that y and x< are measured 
from their means and that y is to be estimated from the n variables . Form 
the values of y into a column matrix Y and the values of x< into an N by n matrix 
X. Introduce the column matrix B of n parameters and define 

(60) 


E - Y - XB. 



SYMBOLIC MATRIX DERIVATIVES 


529 


Note that the matrix E'E is in this case the single element matrix which is the 
sum of the squares of the residuals. Following the least squares method we 
minimize this by differentiating with respect to the elements of B. We first 
note that 

(01) E'E = (Y f - B'X'){Y - XB) 

= Y'Y - Y'XB - B'X'Y + B'X'XB 
Then we write down first 


( 02 ) 


d(E'E) 

w 


from which we get 


-Y'XJ - J'X'Y + J'X'XB + B'X'XJ, 


(63) 


d{E'E) 

dB 


= -X' YK - X' YK' + X'XBK' + X'XBK 
= -X'{Y - XB){K + K') = —X'E{K + K'). 


The J’s and K’ s are associated with B and E'E respectively. Here E'E is scalar 
so that K — K' = 1 and we have 


((H) 


d(E'E) 

dB 


-2 X'E. 


The equation X'E = 0, obtained by equating the right hand side of (64) to zero, 
is a statement of the normal equations in matrix form. 

Equation (64) may also be obtained with the use of the methods of section 
13. In this case 


dE 

m 


= -XJ, 


dE' 

d(B) 


-J'X', 


and we have 


(65) 

so 

( 66 ) 


a (E'E) _ dE' dE _ 

~d(W " d(B) + * d(B) ~ 


-J'X'E - E’ XJ; 


d(E'E) 

dB 


= -X'EK' - X'EK = —X'E(K' + K). 


The equation (64) is also applicable to the more general problem in which 
yi and yi are estimated from the same set of variables x ,. The only change 
needed is to regard Y, B,E as two-column matrices so that E'E is a matrix with 
two rows and columns which we denote by 



530 


PAUL 8. DWYER AND M. 8. MACPHAIL 


We require 

On 

we get 


0 and 


den 

dB 


= 0 . 


From equation (63), inserting subscripts, 


fg = -X'E{K n + Kn) 
= -2 X'EKu ; 

™ = -2X’EK„. 


It is easily seen that - = - D - = 0 is equivalent to X'# = 0, the same equation 

on on 

as we obtained in the last paragraph. We also arrive at the incidental result that 
in minimizing 2 el , and 2 el separately we find at the same time a stationary 


value of 2 €i€j . 

In this way we can treat two or more simultaneous regression problems with 
this general notation as easily as we can treat one. 

As a second application of the theory we outline the initial steps in the direc¬ 
tion of the formulas for canonical correlation [4], [5]. In this case A and B are 
unknown column vectors with X and Y known rectangular matrices. Then 
XA is a column matrix: 


XA = L - 


ru 


u 


L ^ n ~ 

whose elements U may be regarded as observed values of a linear form l. Simi¬ 
larly YB = A, a column matrix whose elements may be replaced as observed 
values of a linear form X. It is desired to find A and B such that l and X may have 
the largest correlation coefficient, and to find the size of this coefficient. Then 
A'X'XA, B'Y'YB, and B'Y'XA = A'X'YB are scalars, and 


(67) 


_ B' Y'XA 

p ~ V(A'X' XA)(B' Y' YB) ’ 


If the scales of X and Y are chosen so that A'X'XA = 1 and B'Y'YB = 1, we 
have 


(68) p = B'Y'XA - A'X'YB . 

Using Lagrange multipliers we set 

(69) <t> = B' Y'XA + l (1 - A'X'XA) + £ (1 - B' Y' YB), 

and differentiate with respect to the elements of A and B. We first differentiate 
<f> with respect to A after replacing B'Y'XA by A'X'YB: 



SYMBOLIC MATRIX DERIVATIVES 


531 


(70) = J'X'YB - C 2 (. I'X'XA + A'X'XJ); 

(71) = X' YBK' - £ ( X'XAK' + X'XAK). 

dA 2t 

q/A 

(The J’ s and K’s are associated with A and 0 respectively). We set , — 0 

dA 

with K = K' = 1 to get 

(72) X'YB = cX'YA, 
whence by (57) 


(73) p « A'X'YB = cA'Y'XA » c, 
and 

(74) X'YB = pX'XA. 

Similar differentiation with respect to B gives p = d and 

(75) Y'XA = pY'YB. 

The further steps in the development of canonical correlation theory are based 
on (74) and (75). 

A third application is to orthogonal regression. The situation is very similar 
to that of the first illustration, but the errors are measured orthogonal to the 
plane of best fit. As before we take the variates as measured from their means 
and so have the basic equation 

^ b\x i + b 2 x 2 + • • • + bkXk 

(m) 0 ” VH + 45+-+H • 

This can be written as 

(77) D = l\X\ + l 2 x 2 + ••• + IkXk = XL with L f L = 1. 

It follows that the quantity to be minimized is 

(78) D'D = L'X'XL. 


With the use of Lagrange multipliers we have 
(79) $ - L'X'XL + \(1 - L'L) 


so that 
(80) 

(81) 

from which 
(82) 


d<t> 

m 

dL 


= J’X’XL + L'X’XJ - \(J'L + L'J), 
= X'XLK' + X'XLK - \{LK' + LK) 


2 X'XL - 2XL = 0 



532 


PAUL S. DWYER AND M. S. MACPHAIL 


and the values can be determined from the equation 

(83) (X'X - X)L » 0. 

The solution continues with the use of the characteristic equation. 
It is to be noted from (79) and (82) that 

D'D = L'X'XL = XL'L - X 


so that (83) becomes 

(84) (X'X - D'D)L = 0. 

A fourth illustration uses symbolic derivatives in obtaining the principal com¬ 
ponents of a total variance [5,252]. The variable portion of the exponent of the 
multivariate normal can be written Y'AY where Y is the column vector 
b/i, ••• 9 Vl] and A is a k by k matrix. We set this equal to a constant, say C, 
and get the equation of the k dimensional ellipsoid. It is desired to locate the 
extrema of this ellipsoid. To do this we find the extrema of Y'Y. Using the 
Lagrange multiplier we have 


(85) 0 = Y'Y + X(C - Y'AY) 

so that 


(86) 

II 

<T> 

(87) 

II 

>5 

so that there results 


+ Y'J - \(J'AY + Y'AJ ), 
+ YK - \(AYK' + AYK), 


(88) F-XAF = 0. 
Pre-multiplying by A 1 we get 

(89) (A- 1 - \)Y - 0 

and pre-multiplying by Y' gives the important relation 


(90) 


Y'Y = XC. 


A fifth illustration utilizes symbolic differentiation in developing the theory 
of the linear discriminant function [6, 341] [8,124]. As in the other illustrations, 
the variates are measured about their means. The unknown multipliers are 
indicated by the vector L. Then 


(91) Z = XL 
is the general matrix equation while 

(92) Z\ = X x L 


Z 2 = XtL 



SYMBOLIC MATRIX DERIVATIVES 


533 


are the corresponding equations for the two groups. Then 

(93) = X x L, = X t L, and Z t - Z 2 = (X t - X t )L = DL, 

, ni . Zi ~ %. = (X, - X x )L = Y 1 L, 

(94) 

Z 2 -2 J = (X, - X 2 )L = FjL. 


The within group variation, L'Y[Y x L + L'Y' 3 YiL, is then divided into the 
between group variation, L'D'DL, to get 


(95) 


_ L'D'DL _ A 

~ L'Y[ Y'iL + L'Y'i YtL~ B‘ 


We wish to maximize G. Since A and B are scalars 


d(G) 

dL 


(96) 


d(B) = 1 d(A) 
dL G dL 
which becomes, with further differentiation 


(97) 


Since ^ is a scalar, we have 

G- 


(98) 


(Y' 1 Y l + Yi Y 2 )L = D' 
ve 

{Y[ Yi + Y> Y 2 )L = cD. 


0 reduces to 


Any convenient value of c can be used for purposes of discrimination. It is 
customary to take c — 1 and then to adjust (98) so that some U is unity. 

A final illustration applies symbolic matrix differentiation to a theorem of 
multiple factor analysis. This presentation parallels that given by Thurstone 
[7,473-477] for transforming any factorial matrix into a principal axes matrix. 
The matrix 

(99) F = [at,] 


has p rows and r columns, r < p, such that 

(100) FF' = R 
where R is a p X p correlation matrix. 

It is desired to apply the unitary orthogonal transformation L to F in such a 
way as to produce a matrix, called F p , which has the sums of the squares in 
respective columns a maximum. This can be done by maximizing simultane¬ 
ously the diagonal terms of F' v F p where 

(101) F p = FL. 

Again using Lagrange multipliers, we have 

(102) 0 = L'F'FL + X(7 - L'L). 



534 


PAUL S. DWYER AND M. S. MACPHAIL 


This equation has the same analytical form as (79). Differentiation leads to 
the result 

(103) (F'F - X)L = 0. 

The solution of (103) gives the value L which can be substituted in (101) to 
obtain F p . 

14. Conclusion. Two types of symbolic matrix derivatives have been de¬ 
fined. Laws have been developed for the basic operations of addition, sub¬ 
traction, multiplication, inverse, and powers. Laws for more extended func¬ 
tions can be worked out on the basis of principles enunciated. 

Applications are given to certain multivariate problems. It is our thesis tliat 
with these differentiation formulas available, much work in multivariate analysis 
can be carried on with a simple matrix notation. 

REFERENCES 

[11 R. A. Frazer, W. J. Duncan, and A. R. Collar, Elementary Matrices and Some Appli¬ 
cations to Dynamics and Differential Equations , Cambridge University Press, 
1936. 

[21 P. S. Dwyer, “A matrix presentation of least squares and correlation theory,” Annals 
of Math. Stat ., Vol. 15 (1944), pp. 82-89. 

[3J A. D. Michal, Matrix and Tensor Calculus , New York, John Wiley and Sons, 1947. 

[4J H. Hotelling, “Relations between two sets of variates,” Biometrika, Vol. 28 (1936), 
pp. 321-377. 

[5J S. S. Wilks, Mathematical Statistics , Princeton University Press, 1946. 

(6J M. G. Kendall, Advanced Theory of Statistics , Vol. II, London, Charles Griffin and Co. 
Ltd., 1946. 

[71 L. L. Thurstone, Multiple Factor Analysis , Chicago, University of Chicago Press, 1947. 
(8l P. G. Hoel, Introduction to Mathematical Statistics, New York, John Wiley and Sons, 
1947. 



ON THE LIMITING DISTRIBUTIONS OF ESTIMATES BASED ON 
SAMPLES FROM FINITE UNIVERSES 1 

By William G. Madow 
Institute of Statistics, University of North Carolina 

1. Summary. The paper shows that under very broad conditions the usual 
theorems concerning the limiting distributions of estimates hold for estimates 
based on samples selected from finite universes, at random without replacement. 
It may be remarked that under the same conditions, the same conclusions are 
true for random sampling from finite universes with replacement, if the universes 
are permitted to change within the limitations set by condition W. 

2. Introduction. It has long been known that the limiting distribution of 
arithmetic means of samples selected at random with replacement from finite 
universes, or from infinite universes is normal under very general conditions. 
When, however, a sample is selected from a finite universe without replacement, 
and the size of the sample as compared with that of the universe is too larg^ for 
the universe to be treated as infinite, the proof that the limiting distribution of 
the mean is normal appears to have been given only for the case where the uni¬ 
verse is multinomial. 2 In this paper we prove that the limiting distribution of 
the mean is normal provided only that as the universe increases in size, the higher 
moments do not increase too rapidly as compared with the variance, and that 
for sufficiently large sizes of sample and population the ratio of size of sample to 
size of universe is bounded away from 1. Various extensions are given, but these 
are almost immediate consequences of the theorem on the limiting distribution 
of the mean. 

The method used is that of showing that the moments of the standardized mean 
tend to those of the normal distribution. In doing this we generalize a theorem 
of Wald and Wolfowitz, 8 by making it applicable to permutations of samples 
from finite populations, and by reducing a little the conditions on the coefficients. 
The theorem on the mean is then a simple corollary. 

We also note that with these proofs on limiting distributions we can make the 
corresponding assertions concerning characteristic functions. Although no 
applications of this fact are given, it seems likely that some useful results could 
be obtained. 

3. Preliminary lemmas. In calculating the k-th moments and their limits we 

1 Presented to the American Mathematical Society at a meeting held in New York City 
on April 17,1948. 

* See F. N. David, “Limiting distributions connected with certain methods of sampling 
human populations,” Stat. Res. Mem., Vol. 2 (1938), pp. 69-90, especially p. 77. 

* A. Wald and J. Wolfowitz, “Statistical tests based on permutations of the observa¬ 
tions,” Annals of Math. Stat., Vol. 6 (1944), pp. 368-372, especially p. 369. 

535 



536 


WILLIAM G. MADOW 


shall use an infrequently given form of the multinomial expansion and some 
properties of symmetric polynomials. In this section we make the necessary 
definitions, and present four lemmas embodying the results we shall use. 4 

A /-partition of a positive integer k consists of Z positive integers , • • • , a t 
such that a\ + • • • + ot t = k. Two /-partitions ai, • • • , a t and ft , • • • , ft of 
k will be said to be distinct if for at least one value of h we have ot h ^ ft . 

Let <p(ot \, • • • , a/), written <p(a), be any function of the /-partitions of k. By 
2i^(a) we shall mean the summation of <p(a i, • • • , a t ) over all distinct /-parti¬ 
tions of k. 

By 2nv(a) we shall mean the summation of <p(a) over all distinct permutations 
of ai , • ‘ , a f . 

By 2si?(a) we shall mean the summation of <p(a) over all distinct / partitions 
of k satisfying the condition <*i > a* > • • • > a t . 

Let ^(vi, • • • , v t ) be any function of the variables v\ , • ■ • , v t . Then by 
S 4n ^(^i , • • • , ^<) we shall mean the summation of \fr(v\ , * • • , v t ) over all possible 
selections of t integers from 1 to n arranged so that v\ > > ■ • • > v t . 

The formula for the multinomial given below is not presented as a new result. 
It is given only as a means of referring to the result we need. 

Lemma 1. Let £i, • • • , £» be any quantities or random variables and let 1; be a 
•positive integer . Then 

(fl + • * • + in)* — S 2i* Ca r -.« ( S 4 nfJi 1 * * * £*/ , 

- t -1 

where 


The proof is omitted. 

The following lemma will be useful in connection with several of the results of 
this section: 

Lemma 2. If <p(a) is a function of the t-partitions of k } then 

= Xzt^2t<p(oc). 

The verbalization of the lemma is practically its proof. 

Let us now define certain symmetric polynomials that we shall use. 

Let S ai , • • • £?/ where the a ’s are positive integers and the sum¬ 

mation extends over all possible arrangements v\ , • • • , v t of Z of the integers 
1, • • • , N. Hence there will be N {t) = N(N — 1) • • • (N — / + 1) terms in 

Lemma 3. Suppose that Zi , • • • , th are an h partition of t, that 

**<!+•••+ == y (i S5S * * * y hy to “ 0), 

4 The order of sections 3 and 4 is largely a matter of taste; some may prefer to treat sec¬ 
tion 3 as an appendix to section 4 to be referred to when necessary. 



LIMITING DISTRIBUTIONS OF ESTIMATES 


537 


and that 

Ofi 5 ^ a*i+i ^ ^ ott x ..+/ A _j+i . 

Then, defining 

(3.1) S' av .. at = 
it follows that 

To prove Lemma 3, it is only necessary to note that each term of £i lt will 
determine hi • • • hi equal terms of S ai . . . 

Although the moments that we shall obtain will be functions of &«,,, the 
condition that we shall use on the moments can be interpreted directly only in 
terms of Sj . Consequently, in order to be able to analyze the implications of 
that condition on S aii . , ai , we state the following lemma: 

Lemma 4. The symmetric polynomial S ay , . ,«< is equal to a sum of products of 
the form 

±>S Yi $y 2 * * • Sy h 

where 71 , • • • , y h are an h-partition of k, h < t , and each 7 is a sum of one or more 
of the cts. Furthermore , if *Si = 0, then h < [k/2] where [k/2] = k/2 if k is even 
and [k/2] — (k — l)/2 if k is odd. This follows from the result 

(3.2) Sa t * $<* 1 , ,a t = $aj + a t ,a 2 ,• ,a t _ j “f“ * “1“ S ai , • ,a t _ 2 »«* — i+«f • 

Proof: It is easy to prove (3.2) by comparing terms. Then the other asser¬ 
tions follow from the repeated use of (3.2) and the resulting fact that each 7 is a 
sum of one or more of the a’s. 

4. The limiting distribution. In this section we obtain the generalization of 
the theorem of Wald and Wolfowitz to which reference was made above. 

Let Ui , U 2 , • • • , , • • • be a sequence of universes, the universe U N con¬ 

taining the elements 5 x vN and let the arithmetic mean of the elements of Un be 
denoted by x N . Furthermore, let 



Let Ci , C 2 , • • • , C„ , • • • be a sequence of sets of coefficients, the set C n con¬ 
taining the elements cj n and let the arithmetic mean of the elements of C n be 
denoted by c» . We exclude the possibility that the elements of any C» all vanish, 
and hence we can suppose that Ed,- = 1. Furthermore, let 

5 The letter v will assume all integral values from 1 to N . The letter r will assume all 
positive integral values. The letter j will assume all integral values from 1 to n. The letter 
t will assume all integral values from 1 to k. The symbol lim will stand for the limit as n or 
N or both, as the case may be, increase without limit, it being understood that lim n/N < 1. 



538 


WILLIAM G. MAD0W 



Since £(cy„ — c n ) 2 > 0, it follows that, if we define A n = n 1/2 d n , then A\ < 1 . 
j 

Let n elements be selected at random without replacement from U N and let 
us denote these elements by x'j n , the subscript; indicating the order of selection, 
i.e., x' %n is the i-th element of U N selected for the sample even though it may be 
Xsn . 

The linear function that we shall study is 

t . , / 

Z n ~ Ci n X\N “T C nn XnN , 

i-.e., the value of z n is determined by multiplying the,/-th element selected for the 
sample by c )n and summing for j. Then, since Ex in = x N , we have 

Ez n = nx N l n . 


Furthermore, 

To see this we first note that 


and, if i ;, 


xL* CinCjn — 71 & n 1, 


E(x[ n ~ ^Jv ) 2 = M 2 AT, 


E(x[ n ~ X^ix'jn - X N ) = 



From the definition of variance we have 


— E{Zn EZn) — ^/(» E(x tn Xff^iXjn, 

•*./-1 

and making the indicated substitutions the result follows from a few simple 
manipulations. 

If we define to be the arithmetic mean of x[ N , • • • , x ' nN , then it follows that 
\/w C/n = 1 and, as is well known, 


Ex n — x 



Hence, if we can find the limiting distribution of 

„ Z n — Ez n 
— - , 

then the limiting distribution of (x — £)/<r* will be a special case. 



LIMITING DISTRIBUTIONS OF ESTIMATES 


539 


We shall need to place some sort of limitation on the sequences U N and C n if 
we are to obtain theorems on limiting distributions of statistics based on them. 

The condition W that we shall use is satisfied by a slightly larger class of se¬ 
quences Un and C n than that of Wald and Wolfowitz because it does not rule out 
the possibility that all the elements of C n should be equal. It should be noted, 
however, that for their purposes this extension of the class of sequences satisfying 
Un and C n is vacuous since they required n = N, so that in their case if all the 
elements of C n were equal, say k/N, we would have z N = k x N no matter in what 
order the elements of Un were selected for the sample. 

Condition W. The sequence Un and C n will satisfy the condition W if 

Mrtf — M2A? \ r (N), 

Mm - ri~ rl2 \r(n) f 


and 


nAl 

N~ 


< 1 - €, 


for sufficiently large n and N, where a finite value X exists such that for all r 


SUp | \ r (N) | < X, 


and € > 0 . 


sup | Xr(n) | < X, 


(Note that if W is satisfied for all even values of r then W is also satisfied for all 
odd values of r since Mr+ 2 Mr > /4+i). 

A general theorem on moments is the following: 

Theorem 1. Let S ai , -, a t and be defined in terms of x N — x H 

instead of {, and let T' ai ,..., ai be the same function of the c jn that is of the 

x,n — Xn • Furthermore , let E k — EZ k n . Then 

(4.1) Eu = EZa, ^1,-^ . 

1 N a Sn 

Proof: From the definition of Z n and Lemma 1, it follows that 
<y\ n E k = - x K ) a ' ■ ■ ■ (x',v - x w )°‘. 

I 

Since we are selecting at random without replacement it follows that 
N U) E(x n N — Xn) 1 ••• (x Pt N Xn) 1 = Sa\'"at • 

If we now use Lemma 2 to replace by 2 3 <2 2 <, we then obtain 

<T *„N U) E k = E C k ai ...a, S ai ... a , Zu Zin c“‘„ • • • C“‘ n , 

t 

since both C£ tI ...,«, and <S ai ,are invariant under permutations of a t , • • • , 
a t . Then from (3.1) and the definition of T' ai ,..., at , it follows that (4.1) is 
proved. 



540 


WILLIAM G. MADOW 


Our fundamental theorem is: 

Theorem 2. If the sequences U N and C n satisfy the condition W, then 

lim Etj+i = 0 , 

and 


lim F ®) ! 
hm £ S) = 2 ^ 7 ,, 


so that , /or any a, 


limP{Z„ < a) - wgj j[ *' m 


dx , 


Proof: We wish to show that lim exists and has the values given above 
First consider the parts of the typical term of E k that depend on n and N, i.e., 
the expression 

B = N^'JiN/N -I?'* (T - nA\/N ) m ' 

Since lim E k will be the sum of the limits of a finite number of these terms, let 
us first determine under what conditions B will tend to zero as n and N become 
infinite. 

From Lemma 4 it follows that 

Sax ,•••,«! = S zt: $ 7 i aS » 72 • • • <S 7A , 


where yi + • • • + 7 * = ai + • • • + a t and each of the 7 *s is the sum of one or more 
of the a’s. From the definition of in terms of x vN — x N it follows that 

Si = 0. Hence the minimum value of all 7 ’s in any non-vanishing term of the 
summation is 2. Consequently we can say that for all non-vanishing terms h < 
[k/2] and h < t Finally if condition W is satisfied then 

fin • ■ • S u - N k AX(N) 

where 

sup | \ h (N) | < X\ 

Similarly 

^«i.-•*.«< *= 2J ± T yi ,...,T yg , 

where it may be that Ti 5 ^ 0 so that we cannot require g < [k/2] for the term 
T yg to be non-vanishing. We still have, however, from Lemma 4 that 

9 <t. 

If condition W is satisfied, then 

Tn ■■■ T y , = n°- kl \(n), 



LIMITING DISTRIBUTIONS OF ESTIMATES 


541 


where 

sup | \g(n) | < \°. 

Hence, from Lemma 4, the definitions of p iN and n' jn and condition W it follows 
that B is a sum (the number of terms does not depend on n or N) of terms like 

_ N h n 9 ~ k l2 \ (N)\' (n) _ 

D “ N (t) (N/N - 1)* /2 (1 - nAl/N) m ’ 

where 

h < Ik/ 2 ], h < t, g <t, 

and 


sup | \(A0 | < oo, 
and sup | X'(n) | < qo . 


Since h < t f it follows that if g < k/2 then lim D = 0. Hence, a possibly non¬ 
vanishing term must have g > k /2 and hence t > k /2 because t > g. Further¬ 
more, t > g + h — k /2 , since h — k /2 < 0 and t ^ g. Hence i — h > g — k/ 2 . 
Now, we can write 


D = 


n 


O-k/2 


N*~ h 


UN, n), 


where 

sup | X(N, n) | < oo, 

since nA\/N < 1 — 6 for sufficiently large n and N. 

Hence 

lim D — 0 , 

unless, perhaps, when g — k/2 = t — h, i.e., h — k/2 = t — g. Since h — k/2 < 
0 and t — g > 0 , it follows that we must have h = k/2 and t — g for lim D to be 
possibly not zero. 

If k is odd, then h < (k — l )/2 and hence 

lim E 2 j+i = 0 , 

since all terms obtained by expanding it as above will tend to zero. 

If k is even, say k = 2 /, and lim D is possibly non-vanishing, then h must equal 
j and we must have yi = • • • = yj = 2. Consequently, from Lemma 4, the 
only possibly non-vanishing terms of E 23 are those arising from the polynomials 
S ai with ai = ■ • • = a, = 2 , and a,+i = 1 , so that 

2s 4- $ — a = 2j or f = 2j — s, s = 0,1, • • • For such values of ai, • • • , a t 
we have 

C k 

^*1.•••.«< 


(M ! 

2 * ' 



542 


WILLIAM G. MADOW 


Furthermore, as shown below, in developing S ai by means of Lemma 4 the 
coefficient of S{ is 


(4.2) 


(-ir* 


(2j - 28) ! 

- 8 ) * 


Demonstration of (4.2): If s = j> then it follows from Lemma 4 that the 
coefficient of Si is 1. If s < ?, we use Lemma 4, and noting that Si = 0, we 
obtain 


where, since a t = 1, we have «i + a t = a 2 + at = • • • = 1, a, + a< = 3, and 
a.+ 1 + a t = • • • = a t - 1 + a t = 2. Consequently of the t — 1 terms of the 
above evaluation of S ai , exactly 8 will have a’s > 2 and f — 8 — 1 will be 
of the same form as except that instead of 8 of the a’s being 2 we have 

8 + 1 of the a’s equal 2. For each such s we repeat the process obtaining 

S.,....... = (-l) t, - ,/2 tt - « - 1)0 - S - 3) ... 31 

. ? 

+ terms which have h < j. 

Consequently (4.2) provides the coefficient of Si in S au ~ ,«t • Since the other 
terms of jS ai have h < j, they lead to terms of E 2j that vanish in the limit. 

Furthermore, by Lemma 3, T« lf ..., ai = Tl,,...,««$!(* — 8)! and the only term 
of 5P ai for which 0 = £ is 

TiTf - n ( ' - ’ )/2 ^r*. 


The other terms of T ai , .., at will lead to terms of E 2i that vanish in the limit since 
g < t. Consequently, eliminating terms known to tend to zero as n and N be¬ 
come infinite, we see that E 2j — /(n, N) tends to zero as n and N become infinite, 
where 


f(n, N ) 


= 1V - (2j — 2s)! N’ n’~' AV~ U 

to 2* 2 J — a!(2i - 2 s)!^ ( 2j “* , (1 - nA\/N) r 


Now as n and N become infinite with n < N, we see that 
lim /(n, N) = lim ^| ! £ (-if* MW/U - nA 2 ./N) 2 


i.e., 


. m 

2>j\ ’ 


lim F - W> ! 

llm E *>' ~ 2^71 • 


To complete the proof it is only necessary to note that the normal distribution is 
completely determined by its moments.* 


• See for example, M. G. Kendall, The Advanced Theory of Statistics, Vol. I, London, 
Charles Griffin and Company, page 110. 





LIMITING DISTRIBUTIONS OP ESTIMATES 


543 


Since Theorem 2 is a generalization of the Theorem of Wald and Wolfowitz, 
it is possible to generalize slightly all the applications they make of their theorem. 
The statements of these generalizations are omitted. 

The application of Theorem 2 that led to this paper is the following: Suppose 
that Cj n = n~ 1/2 . Then the sequence C n satisfies W and A n — 1. Consequently 
we have proved 

Corollary 1. If the sequence Un satisfies the condition W and if x n is the arith¬ 
metic mean of a sample of n elements selected at random without replacement from 
U N , then , for aU a, 


lim P < -i/2 


Tl l,2 (Xn ~ Xn) 


" /..xi < 


hVn\1 - m/ny 


•} - m. 


e-*' n dx, 


provided that € > 0 exists such that n/N < 1 — e, if n and N are sufficiently large. 

Now the sequence of U N will certainly satisfy W if U N has the same moments 
for all values of N, or if the moments of Un tend to fixed values as N increases, 
or if the universe U N is a random sample of a universe having these properties. 
Consequently Theorem 1 and its corollaries will be valid for many applications, 
among them being the case studied by F. N. David 7 when U N has the same multi¬ 
nomial distribution for each value of N. 

The condition W is immediately satisfied for large classes of changing uni¬ 
verses. For example, if the elements of all Un are uniformly bounded and 


lim P 2 n t* 0, 

then the condition W is satisfied. As an illustration, consider the case where 
Un contains Np N elements having the value one and N(l — p N ) elements having 
the value zero. Then 


and 


M2 N — Pn( 1 — Pn ), 


1 X 

MrAT = Tf (1 — P#Y + 12 (-PnY > 

= Pn( 1 - PnY + (— l) r (l — Pn)Pn- 

Hence 

Mrjv _ (1 “ PnY' 2 . / I y Pn 2 

p r r i +1 ; a-p*r-" 

so that condition W will be satisfied if e > 0 exists such that € < p N < 1 — « 
for all sufficiently large N. 

Hence the limiting distribution of Z n will be normal no matter how the propor¬ 
tions Pn change provided only that the universe Un does not come to consist 
essentially only of zeros or only of ones. 


Op. cit. 




544 


WILLIAM G. MADOW 


Various multivariate extensions of Theorem 2 are immediate. For example: 
Theorem 3. Suppose that the elements of U N are vectors of two components , 8 
(x , Xptn), and that the condition W is satisfied by the sequences C n , Uni > and 
U N 2 where U N h , h = 1 , 2, contains the elements x rNh . 

Let 

Znh = ^Cj n X j n h > 
i 

and let 

ry Znh Lz n h 

&nh =-- - > 

2nh 

where the random variables x'jnh are defined as were x) n • 

Let 

— ~ Ztn)(x *N2 ~~ Xm) 

Plf (M2JV1*M2JV2) 1/2 

and suppose that lim pn exists and is equal to p where p > -1 -f (. Then, the 
limiting distribution of Z n i and Z n2 is bivariate normal with means 0, variances 1, 
and correlation coefficient p. 


Proof: To prove Theorem 3 we shall show that any linear function 
ttZni + kZni will be normally distributed in the limit if h and U are not both 
zero. It will then follow 9 that the theorem is true. 

If we define tf N to be the sequence whose elements are 


4 ti(Xp*n — Xfti) , kixps^ — Xyf) 

XpN = 1/2 + 1/2 

M2JV1 M2V2 


then the arithmetic mean of &n is zero. Let 

~ ^ CjnXjN j 


and let 


a Z n — Et n 

= - • 

<r:-n 

Then, it is readily verified that 

g __ tl%nl +_UZn2 

<Tt x Z nl +t*Z ni 


8 The generalization holds for any finite number of components but, to simplfy the dis¬ 
cussion, is stated for two components only. The method used is due to H. Cramer, Random 
Variables and Probability Distributions , Cambridge University Press, London, 1937, p. 106. 

• H. Cramer, Random Variables and Probability Distributions , Cambridge University 
Press, London, 1937, p. 105. 



LIMITING DISTRIBUTIONS OF ESTIMATES 


545 


Consequently, to prove that t\Z n \ -j- faZ n 2 has a normal limiting distribution, we 
need to verify that the sequence XJ N satisfies the condition W if U N \ and U N2 do. 
The moments of ft N are 


so that 


VrN 


N 


' I 


V2N = t\ -f* tl + 2tikpjf , 


where p N has the usual form of the correlation coefficient. Furthermore, using 
the binomial expansion, we have 


(4.4) 

where 


VrN 


= Ee; 


utr 


V<*,r—aN 
, «/2 , (r-o)/2 > 
V2N\ V2N2 


Va,r—aN 


N 


E (x,. 


XNi) a (x y tf2 ~ Xfi2 ) r a . 


Then, by the Cauchy-Schwarz inequality we have 

| 2 C^xA'l — Xffi) a {x v tf2 ~ XS2) a | 

r 

< IE (*.« - ) 2 ° • E (i.« - 

P V 

so that 

1 1 ^ 1/2 1/2 

I Va.r-aN | V2a.Nl V2r-2a,N2 j 

and using condition W for Uni and Un 2 , we have 

V2a,Nl < V?Nl\(N), V2r-2aN2 < V 2 N 2 ^(N). 


Hence, substituting in (4.4) we see that 

SUP | VrN | < 00 . 


Hence the sequence Un satisfies the condition W for all U and t 2 , and Theorem 
3 is proved. 

From Theorem 3, it then follows that the theorems on the limiting distribu¬ 
tions of moments, product moments and functions of moments 10 are valid for 
sampling from finite universes, at random without replacement. 

10 The most important of these theorems are given in II. Cramer, Mathematical Methods 
of Statistics , Princeton University Press, Princeton, 1940, sections 28.2-28.4, pp. 364-367. 



A NON-PARAMETRIC TEST OF INDEPENDENCE 1 

By Wassily Hoeffding 
Institute of Statistics , University of North Carolina 

1. Summary. A test is proposed for the independence of two random variables 
with continuous distribution function (d.f.). The test is consistent with respect 
to the class 8" of d.f.'s with continuous joint and marginal probability densities 
(p.d.). The test statistic D depends only on the rank order of the observations. 
The mean and variance of D are given and \/n(D — ED) is shown to have a 
normal limiting distribution for any parent distribution. In the case of inde¬ 
pendence this limiting distribution is degenerate, and nD has a non-normal 
limiting distribution whose characteristic function and cumulants are given. 
The exact distribution of D in the case of independence for samples of size 
n — 5, 6, 7 is tabulated. In the Appendix it is shown that there do not exist 
tests of independence based on ranks which are unbiased on any significance 
level with respect to the class fl". It is also shown that if the parent distribution 
belongs to S2" and for some n > 5 the probabilities of the nl rank permutations 
are equal, the random variables are independent. 

2. Introduction. In a non-parametric test of a statistical hypothesis we do 
not make any assumptions about the functional form of the population distribu¬ 
tion. A general theory of non-parametric tests is not yet developed, and a 
satisfactory definition of “best” non-parametric tests does not seem to be avail¬ 
able. Desirable properties of a “good” non-parametric test are unbiasedness and 
consistency. A test of a hypothesis H 0 is said to be consistent with respect to a 
specified class of admissible hypotheses if the probability of accepting H 0 tends 
to zero with increasing sample size whenever a hypothesis ^ H 0 of this class 
is true. 

In this paper we consider the problem of testing the independence of two 
random variables X, Y on the basis of a random sample of size n. In all that 
follows the d.f. F(x, y) of (X, Y) is assumed to be continuous. We will denote 
by O' the class of continuous d.f.’s F(x, y) and by 0" the class of d.f.'s having 
continuous joint and marginal p.d.’s, 

/(*, y) - d 2 F(x, y)/dx dy, fi(x) = Jf(x, y) dy, My) = Jf(x, y)dx. 

The hypothesis H 0 to be tested is that F(x, y) is of the form 
F(x, y) = F(x, oo)F(co, y ). 

Several tests of this hypothesis have been proposed. Among them those 
deserve particular attention which depend only on the rank order of the obser- 

1 Research under a contract with the Office of Naval Research for development of multi¬ 
variate statistical theory. 


546 



A NON-PARAMETRIC TEST 


547 


vations. They will be referred to as rank tests. The critical region of a rank 
test of independence with respect to the class ft' is similar to the sample space; 
the rank tests share this property with other tests obtained by the method of 
randomization (cf. Scheff£ [1]). A characteristic feature of a rank test is that it 
remains invariant under order preserving transformations of X or Y. 

Rank tests of independence have been studied by Hotelling and Pabst [ 2 ], 
Kendall 13] and Wolfowitz 14]. While nothing is yet known about the power of 
the last test, the author [5] has shown that the two former tests are asymptotically 
biased for certain alternatives belonging to ft'. By a slight modification of the 
examples given in [5] it can be shown that these tests are asymptotically biased 
even with respect to the class ft". 

In the Appendix it is shown that there do not exist rank tests of independence 
which are unbiased on any level of significance with respect to the classes ft' 
or ft". It will appear from this paper that there do exist rank tests of independ¬ 
ence which are consistent, and hence asymptotically unbiased, at least with 
respect to ft". 

3. The Functional A (F). Given a random sample from a population with a 
d.f. belonging to a class ft, we want to test the hypothesis H 0 that F is in a sub¬ 
class to of ft. It is easy to construct a consistent test of H 0 if there exist (a) a 
functional 6(F) defined for every F in ft and such that 6(F) = 0 if and only if 
F € <o; and (b) a consistent estimate of 6(F). There are many ways of devising 
by this method consistent tests of independence. The particular test described 
in the sequel has been chosen mainly for its relative simplicity. 

If F(x, y) is a bivariate d.f., let 

D(x, y) = F(,x, y) - F(x, «>)F(*>, y) 

and 

(3.1) A = A (F) = Jd 2 (*, y) dF(x, y). 

Here and in the following, when no domain of integration is indicated, the 
(Lebesgue-Stieltjes) integral is extended over the entire space (here Rt). 

The random variables X y Y with the d.f. F(x, y) are independent if and only 
if D(x , y) = 0. 

Theorem 3.1. When F(x , y) belongs to ft", A (F) = 0 if and only if D(x y y) s 0. 

Proof. Evidently D(x, y) = 0 implies A (F) = 0. 

Now suppose that D(x , y) ^ 0. Since F(x y y) is in ft", the function d(x y y) = 
f(x y y) — h(x)f*(y) is continuous. We have 

D(x y y) = f [ d(u y v) du dv . 

J—ooJ—oo 

D(x, y) 0 implies d(x, y) ^ 0, and since 

JJ d(x, y) dx dy = 0, 



548 


WASSILY HOEFFDING 


there exists a rectangle Q in R 2 such that d(x, y) > 0 if ( x, y) is in Q. Hence 
Dix, y) ^ 0 almost everywhere in Q, and/(x, y) > 0 in Q. Thus 

&iF) > JJ' I) 2 (x, y) fix, y) dx dy > 0. 

This completes the proof. 

If F[x, y) is discontinuous, we can have A (F) = 0 and Dix, y) ^ 0. This is, 
for instance, the case for the distribution 

P\X = 0, Y - 1} - P[X » 1, Y = 0} - h 

The question remains open whether A = 0 implies Dix, y) = 0 if Fix, y) is 
continuous or absolutely continuous. 

In Section 7 it will be shown that 

0 < A < 3* 0 

The upper bound is attained when Fix , y) is the (continuous) d.f. of a 
random variable (X, Y ) such that X has any continuous d.f. and Y = X (or, 
more generally, Y is a monotone function of X). 

Let 

(l if u is 0, 

C{u) I 

0 if u < 0, 

(3.2) tixy, x 2 , x 3 ) = Cix 1 - x 2 ) - COr, - x 3 ), 

<t>ix 1 , 2 / 1 ; * • • ; ^6, 2 / 5 ) = , ^ 2 , * 3 )^(a;i, Z4, ^( 2 / 1 , 2 / 2 , 2/s)iKyi, yi, y*). 

Then we can write 

(3.3) A = / • ■ • / <*>(zi, 2 / 1 ; • • • ; 3 - 5 , 2 / 5 ) d/'Xzi , 2 / 1 ) • • • dFfe, 2 / 5 ). 

4. The Statistic D . Let (Xi, Vi), • • • , (X„, Y n ) be a random sample from 
a population with the d.f. Fix, y), a > 5 , and let 


(4.1) D = D n = ^ 2"*(X ttl , F ai ; • • • ; X* % , Y a% ), 

where 2" denotes summation over all a such that 


on * 1 , • • • , n; a< 5 ^ a, if i 9 * j, (i, j = 1 , • • • , 5) 


Since the number of terms in 2" is n(n — 1) •••(» — 4), we have by (3.3), 
(4.2) ED = A. 

Since in the case of independence ED = 0, D can assume both positive and 
negative values. It will be seen in Section 7 that — < D n < r> the upper 

bound being attained for every n, while the minimum of D n apparently in¬ 
creases with n. 



A NON-PARAMETRIC TEST 


549 


The random variable D as defined by (4.1) belongs to the class of {/-statistics 
considered by the author [5]. The following properties of D follow immediately 
from the results of that paper: 

I. Let 


l y Vl y * * * y #5 y y&) ~ — g j 2 y y * * * y y J/a^y 

. 2/1 •» • • • ;Xk,y k ) = f ■ • • J*(xi, Vi; • • ■ ;x k , y t ; x k+l , y*+i; • • • x s , y t ) 
d & (^a+i » 2/a+i) * * * dFfai , y$)y (h ^ 1, * * • y 5)^ 
?* = ••• ;*»,»«) - A) 2 dF(x k , v,) ••• dF(x k ,y k ). 


Then the variance of D n is 

«« 

We have 

25 fi < n var < 5 f 8 . 
n var D n is a decreasing function of n, and 
(4.4) lim n var D n = 25 f x . 

n —*oo 

II. By Theorem 7.1, [5], the random variable yjn(D n — A) has a normal limit¬ 
ing distribution with mean zero and variance 25 f i. 

It will be seen in section 6 that in the case of independence ft = 0, so that 
the normal limiting distribution of •%/ nD n is a degenerate one. In this case 
nD n has a non-normal limiting distribution. (See section 8). 


6. Computation of Z). From (4.1) and (3.2) we get after reduction 


(5.1) 

where 


A — 2(n — 2 )B + (n - 2 )(n - 3 )C 
n(n - 1 )(n - 2)(n - 3)(n - 4) ’ 


A = E fl«(flo — 1) 6a(6« — 

«-l 


1), 


(5.2) 


B - E (a. - 1)(6« - 1) c a , 

Owl 

C = £ Ca(c a — 1), 

a—1 


and 



550 


WASSILY HOEFFDING 


«a - T,C(X a - Xp) - 1, b a 8=8 £C(F« ~ ~ 1, 

0-1 0-1 

Ca = t, C(Xa - Xfi)C(Y„ - Y,) - 1. 

0-1 

a« + 1 and + 1 are the ranks of X a and F« , respectively. c a is the number 
of sample members (Xp , Yp) for which both Xp < X a and Yp < Y a • (Since 
F(x , y) is continuous we may assume that X a Xp and Y a Yp if a ^ 0.) 

Thus, to compute D for a given sample we have to determine the numbers 
a*, b a , c a for each sample member, calculate A, B, C from (5.2) and insert 
them in (5.1). 


6. The variance of D in the case of independence. Since F(x f y) is assumed 
to be continuous, so are F(x, °o) and F(&, y). The inequalities xi < x 2 and 
F(x i, oo) < Ffa , 00 ) are then equivalent unless F(x i, <») = F{xi , <»). The 
same is true of yi < y 2 and F(*>,yi) < F( oo, y 2 ) . This shows that the function 0, 
(3.2), does not change its value if Xi , yi is replaced by F(xi , °°), F{ oo, y t ), except 
perhaps on a set of zero probability. Hence A and D are invariant under the 
transformation 

u = F(x, oo), v - F(oo,y) ; C7 = F(X } co), 7 = F(oo, F). 


In the case of independence we have F(x, y) = ui>, and 

f* = f ••• f {&k(ui , ; • * • ; u*, y*)) 2 dui dt>i • • • du* dt>* , 

Jo Jo 

where $1 is defined as 4>*, with Xi , and F(x,, t/») replaced by w*, and w t i\- 
respectively. On evaluation of these definite integrals we get 

Si - 0, 200 • 30 2 f 2 = f, 600-30 2 f 3 = 

600 • 30 2 f 4 = ^ 120*30 2 f6 « 12. 


On inserting these values in (4.3) we obtain 


( 6 . 1 ) 


var (30D) = 


2+J 5n “ 32 ) 

9n(n — l)(n — 3)(n — 4) ’ 


Another way to determine the coefficients f * in the case of independence is to 
compute var D n for n = 5, 6, 7 from the exact distributions given in section 7, 
and lim n 2 var Z)„ from the asymptotic distribution of nD n (section 8). 

n-*oo 


7. The exact distribution of D in the case of independence for n = 5, G, 7. 
Let S = | (xi , 2 / 1 ), * • • , 0r n > y»)} be a sample from a population with a continu¬ 
ous d.f. We may confine ourselves to samples with Xi ^ Xj and p, ^ y 3 if 
i 5 * j. Let (x[ ,y' 0t ), • • • , (x'» , y^) be a rearrangement of , y } ), • • ■ , (z„ , y») 
such that x[ < x'a < " • < x' n and < yi < • • • < y» . The permutation 
II = (ft , • • • , ft) of (1, • • • , n) will be referred to as the ranking of the sample S . 



A NON-PARAMETRIC TEST 


551 


D» depends only on the ranking of the sample. We shall express this by 
writing D n = D n (n) = D n (fi 1 , • • • , ft). If (j3' ai , • • • , fi« m ) is a permuta¬ 
tion of m(< n) of the integers 1, • • • , n such that fi[ < fiZ < • • • < fi' m , 
D m (fi ' ai , • • • , Pa M ) is defined to be equal to D m (a i, • • • , a m ). Replacing in 

(4.1) (X* , Y a ) by (a, ft) we find 

(7.1) D„(fc , • • • , ft.) = (”)-* 2'D.OS.,, • • • , ft„), 

where S' stands for summation over all a such that 1 < «i < a 4 < • • • < a 5 < n. 

Denoting by n (0 the permutation obtained from II = (ft , • • • , ft) by omit¬ 
ting & , we have the recursion formula 

(7.2) n7>„(n) = (» - 5)ZD-i(n w ). 

i-1 

From (4.1) and (3.2) we obtain 

60D&(ft , • • • , ft) = ^(ft , ft , ft)^(ft , ft , ft) + ^(ft , ft , ftMft , ft , ft) 

or 

0 if ft 5^ 3; 

(7.3) 60D 6 (ft , • • • , ft) = <2 if ft = 3 and ft , ft < 3 or ft , ft > 3; 

-1 if ft = 3 and ft < 3, ft > 3 or ft > 3, ft < 3. 

We have 

(7.4) D„(ft , • • • , ft) = D n (ft , ft , ft , ■ • • , ft) 

= D n (ft 7***1 fin —2 i fin » ft-l) = D n (ft , ft—1 , * * * , ft) 

For n = 5 this follows from (7.3) and for general n from (7.1). 

Also, by the symmetry of D n with respect to x and y, Z>„ does not change its 
value if in the permutation (ft , • • • , ft) the numbers 1, 2 or n — 1, n are inter¬ 
changed or the permutation is replaced by its inverse. 

In the case of independence all n\ rankings have the same probability 1/n!. 
To find the distribution of D n we have to determine the number of rankings 
giving rise to particular values of D n . 

If n = 5 there are 5! = 120 rankings. Owing to (7.4) we need consider only 
those with ft < ft , ft < ft , ft < ft . Their number is H 11 = 15. Among 
them those with ft 5 ^ 3 yield A> = 0; this leaves only the three permutations 

(1, 2, 3, 4, 5), (1, 4, 3, 2, 5), (1, 5, 3, 2, 4). 

By (7.3) the respective values of GOA are 2, —1, —1. Thus we have 

P{60Z>5 - 2} = P{60Z>5 - -1} - 

P{60D 6 - 0} = *». 



552 


WASSILY HOEFFDING 


The distribution of As, D 7 , • • • can be obtained in a similar way using the 
relations (7.1) to (7.4). The distribution of D n for n = 5, 6, 7 is given in 
Table I. 

From (7.3) and (7.1) it follows that —-$is < D n < -fa for n = 5, 6, • • • . 
The upper bound is attained for IT = (1,2, • • • , n) and every n. To judge 
by the cases n = 5, 6, 7, the minimum of D n apparently increases with n. From 
ED n = A it also follows that A < ? V 


8. The Asymptotic Distribution of nD n in the Case of Independence. 

Theorem 8.1. If F{x , y) = F{x , <*> )F {«, y ) and F{x, oo) and F(<*,y) are con¬ 
tinuous , ^ random variable nD n + -gV /ios a limiting distribution whose charac¬ 
teristic function (c.f.) is 


( 8 . 1 ) 





where r(k) is the number of divisors of k . 

Note that r(fc) is the number of divisors of A; including 1 and k. Thus r(l) = 1 , 
r(2)=2, r(3)=2, r(4)=3, .... 

The author has not been able to bring the d.f. corresponding to the c.f. g(t) 
into a form suitable for numerical computation. Thus Theorem 8.1 may be 
considered as a preliminary result. For this reason only a brief indication of 
the proof is given here. 

If (Xj , Yi) , • • • , (X„ , Y n ) is a random sample from a population with d.f. 
F(x, oo )F(<*>, $/), let nS n (x , y) be the number of sample members (Xi , F<) such 
that Xi < x, Yi < y. *S„(:r, y) is a d.f. depending on the random sample. If 
we put F(x , y) = S n (x , y) in A (F) as defined by (3.3), we get 


A (S„) - tE-I *(*.,, F., ; • • • ; X ai ). 

n a i »»1 aj»l 


It is easy to prove that if n{ A(&„) — EA(S n )} has a limiting distribution, it is 
the same as that of nD n . 

Now it can be shown that nA(S n ) has a limiting distribution with the c.f. (8.1). 
This can be done either analogously to Smirnoff's [6] derivation of the limiting 
distribution of the goodness of fit statistic col , or applying von Mises* 17] general 
results on the asymptotic distribution of a differentiable statistical function. 
Though the latter paper deals only with univariate distributions, its results can 
be extended to the multivariate case. 

By expanding log ^(0 in powers of it we obtain for thej-th cumulant 




_ 2*-(j - 1)1 a* 

(( 2 /)!P 2 


where B 2j -i are Bernoulli’s numbers, 


ft *■ h ft ** ft 85 ft 88 # • • • 



A NON-PARAMETRIC TEST 


553 


In particular, k\ = and since ED n = 0, the limiting distribution of nA(S n ) 
is that of nD n + 


9. The D-test of Independence. Given a random sample from a bivariate 
population with continuous d.f., a test for independence can now be carried out 
as follows: 

If a(0 < a < l) is the desired level of significance, let p„ be the smallest number 
satisfying the inequality 


P{D n > p n | F € w) < a, 


where a is the class of d.f.’s of the form F(x, <x> )F( oo, y). 

Compute D n as shown in section 5. Reject the hypothesis Ho of independence 
if and only if D n > p n . 

For n - 5, G, 7 the numbers p n can be obtained from Table I. 

From Tehebychefs inequality and (6.1) wc have 


P{ 30 Dn > 


V: 


2(n* ±5n - 32) 


9 n(n — 1)(» — 3 )(n 




Hence 


30p„ < 




2 (ft 2 + dn - 32) 

9 n(n — l)(n — 3)(n — 4)a * 


It follows that p n — 0(n x ). 

If A > 0, we have A — p n > 0 for sufficiently large n. Then 


P{D n > Pn) > P{ | Dn - A I < A - Pn } > 1 - (var D n )/ (A - p„) 2 . 


By (4.4) the right hand side tends to 1. 

This, together with Theorem 3.1, shows that the D-test is consistent with 
respect to the class SI". 

Since P{D« < 0) tends to 0 if A > 0, it is safe not to reject H 0 whenever 
D n < 0. An inspection of Table I shows that at least for small n this will 
happen in more than one-half of the cases if H 0 is true. 


10. Concluding Remarks. It would be interesting to compare the power of 
the D-test with that of other tests with respect to particular alternatives, for 
instance with the product moment correlation test when the population is normal 
with correlation p. A preliminary investigation seems to indicate that for small 
values of | p | and n —> <*> the power efficiency of the D-test as compared with the 
product moment correlation test is rather low. This result may not be conclusive 
for values of n which are of practical interest. On the other hand, it may be 
expected that a test which is consistent with respect to a large class of alternatives 
will have a lower power with regard to a sub-class of alternatives than a test 
which has optimum properties with respect to this particular sub-class. These 
considerations suggest the problem of selecting from a given class of non-para- 



554 


WASSILY HOEFFDING 


metric tests (such as those consistent with respect to ft") a test which is most 
powerful with respect to certain parametric alternatives (such as normal dis¬ 
tributions). 


TABLE I 

The distribution of D n in the case of independence for n = 5,6, 7. 
n = 5 ft = 7 


X 

1SP(60D, - 

x) P|60Z>i > x) 

X 

1630P{ 1260-Dt - *1 P{ 1260Z>7 > x} 

-1 

2 

1.0000 

-11 

8 

1.0000 

0 

12 

0.8667 

-8 

32 

0.9873 

2 

1 

0.0667 

-7 

32 

0.9365 




-6 

8 

0.8857 




-5 

28 

0.8730 




-4 

88 

0.8286 




-3 

64 

0.6889 




-2 

56 

0.5873 


ft = 6 


-1 

8 

0.4984 

X 

90Pf 180 D, - 

x} P{ 180D, > x\ 

0 

88 

0.4857 




2 

77 

0.3460 




3 

24 

0.2238 

-2 

4 

1.0000 

4 

4 

0.1857 

-1 

28 

0.9556 

6 

56 

0.1794 

0 

36 

0.6444 

8 

8 

0.0905 

1 

16 

0.2444 

9 

4 

0.0778 

2 

1 

0.0667 

12 

24 

0.0714 

3 

| 4 

0.0556 

14 

2 

0.0333 

6 

1 

0.0111 

18 

12 

0.0302 

— — 

- - 

- 

24 

2 

0.0111 




30 

4 

0.0079 




42 

1 

0.0016 


APPENDIX 

A. Equiprobable rankings and independence. Let n„„, (y = 1, 2, • • • , n!) 
be the n! possible rankings of samples of size n from a bivariate population with 
continuous d.f. F(x, y) (cf. section 7). 

If F(x y y) — F(xy oo)F(oo, y) we have 

(Al) P{n„,} = 1/ra! (r-1, •••,«!) 

for every ft. 

Does (Al) for some particular n imply independence? This is not true for 
n - 2. In this case (Al) is equivalent to P{(1, 2)} = If the distribution 
has a p.d. f(x 9 y) } we have 



A NON-PARAMETRIC TEST 


555 


P{(1,2)} = f f [f f f(u,v)dudv+ f f f(u ) v)dudv\f(x,y)dxdy, 

00 j—ooJ_*J—QOj —00 x Jy J 

which equals J whenever/(s, y) = /(—x, y). However, we have the following 
theorem: 

Theorem. If F(x , y) is in 12" and (Al) holds for some n > 5, then 
(A2) F(z, y) = P(s, oo)P(oo, y). 

Proof. (4.2) can be written in the form 

(A3) = a. 

If (Al) holds, the left hand side of (A3) has the same value as when (A2) is true. 
But in the latter case we have A = 0. Hence (Al) implies A = 0. By Theorem 
3.1 this is sufficient for (A2). The proof is complete. 


B. Non-existence of unbiased rank tests of independence. 

Theorem. There do not exist rank tests of independence which are unbiased on 
any significance level with respect to the classes 12' or 12". 

Proof: Let n nr have the meaning of Appendix A. Any critical region of a 
rank test of independence is a set S m = {n nvi , • • • , II n „ m } of m rankings. In 

the case of independence P(S m ) = P{n n „ e S m ] = m/nl We may confine 

ourselves to significance levels m/n\ , m = 1 , 2, • • • , n\ — 1 . To prove the 

theorem it is sufficient to show that for every n — 2, 3, • • • , for some 

m{\ < m < nl — 1) and every S m there exists a d.f. F in 12" such that 

P(S m | F) < m/nl 

We shall prove the slightly more general proposition that this holds for 


m — 1, 2, 3. 

Let the bivariate distribution A n be such that the probability mass is dis¬ 
tributed uniformly on the n — 1 segments 


k — 1 . k 

-r < x < -- , 

(Bl) n - 1 n - 1 


y - x 


n — 2k 


n — 1 ’ 

(k = 1 , 2, • • • , n l), 

and is zero in any region not containing a part of these segments. 

Let B n be the distribution which is uniform on the n — 1 segments 

k — \ . k 

-- < x < -- , 

(B2) n ~ 1 n - l 

x + y - 9 (fe - 1, 2, • • •, n - 1), 


and zero elsewhere. 



556 


WASSILY HOEFFDING 


The d.f.’s of both A n and B n are continuous, with 

F(x , oo) = F(oo, x) = x (0 < x < 1). 

Since the probability of (X, Y) lying on any one of the segments (Bl) or (B2) 
is 1 /(ft — 1), the probabilities P(n/,4 n ) and P(U/B n ) are easily obtained in 
terms of the multinominal distribution with n — 1 equal probabilities. In 
particular, we have 

(B3) P(l, 2, • • • , n I A 2 ) = 1; P(n, n - 1, •••, lift) = 1, 

P(l, 2, • • •, n | A.) = Pin, n - 1, • • •, 11 B„) = (n - 1) rj) 

,B4> - c^r 

P(n, n — 1, • • •, 11 A n ) = P(l, 2, • • •, n | B n ) = 0. 

In general, if II n is any permutation of 1, • • • ,n, we have either P(n n \ A n ) =0 
or P(IJ n | Bn) = 0. For any n n with P(n n | A n ) ^ 0 contains at least one 
“run up” of 2 or more numbers (a sequence of consecutive numbers 
i, i + 1, • • • , % + A') which is not preceded by smaller numbers or followed by 
larger numbers. On the other hand, if a IIn with P(IIn | B n ) ^ 0 contains a 
“run up”, it is either preceded by smaller numbers or followed by larger numbers. 
Hence if P(n n | A n ) ^ 0, then P(n n | B n ) = 0. Similarly, P(n n | B n ) ^ 0 
implies P(n n | ^l n ) = 0. 

From (B3) it follows that for any set S m of m rankings which does not include 
(1, 2, • • • , n) or (n, n — 1, • • • , 1) we have either P(S m | A 2 ) = 0 or 
P(S m | B 2 ) = 0. Hence we need only consider critical regions containing both 
(1,2, ■ • • , n) and (n, n — 1, • ■ • , 1). For m = 1 there are no such regions. For 
m = 2 there is just one. But from (B4) it follows that for n > 2, 

P(l, 2, • • • , n | A n ) + P(n, n - 1, — , 1 | A n ) 

= PTVWV. 

\n - 1/ n \n - 1/ nl 

Finally, if II n is any permutation other than (1, 2, • • • , n) or (n, n — 1, • • • , 1), 
we have, by the preceding arguments, either for A n or for B n , 

P(l, 2, •••,») + Pin, n - 1, • • •, 1) + P(n„) = ( n 1 <|j. 

This completes the proof for d.f.’s in Q'. To prove the theorem for d.f.’s in 
Q" we can replace the distributions A n and B n by distributions A' n and B' n having 
continuous joint and marginal densities and such that the probabilities P(II | A») 
and P(n | Bn) differ as little as we please from P(n | A n ) and P(II | B n ) } respec¬ 
tively. For instance, A i can be defined by the continuous density 



A NON-PARAMETRIC TEST 


557 


fix, y ) = K(< - 

- y 

+ 

x) 

if 0 < 

y 

— 

X 

<«, 

X 

< 1 “ 

*y 

> «; 

= - 

- X 

+ 

y) 

if — « 

< y 

- 

X 

<0, 

X 

> €> 

y 

< 1 - «; 

- K{x + y 

- 


if 

X 

+ 

y 

> «» 

X 

< 

y 

< «; 

= K( 2 - 

- € 

- 

x — 

■ y) if 

X 

+ 

y 

1 

VI 

€,X 

> 1 - 

- e,y 

> 1 - e; 

= 0 




elsewhere, 









where K = 3/(3e 2 — 4c 8 ) and 0 < e < If c is taken sufficiently small, the 
distribution satisfies the requirements. The details are left to the reader. 

The proof also shows the non-existence of an unbiased rank test of inde¬ 
pendence for n = 2 and any level of significance (for we need consider only one 
level, £). It also can be shown that for n = 3, any m = 1, 2, • • • , 5 and any 
S m the inequality P(S m ) < m/3! holds for at least one of the distributions 
A 2 , Az , B 2 , Bi . The question remains open whether there exist rank tests of 
independence which are unbiased for some sample sizes n and some significance 
levels m/nl . 


REFERENCES 

[1] H. Scheff£, “Statistical inference in the non-parametric case, ,, Annals of Math. Stat., 

Vol. 14 (1943), pp. 305-332. 

[2] H. Hotelling and M. R. Pabst, “Rank correlation and tests of significance involving 

no assumptions of normality,” Annals of Math . Stat., Vol. 7 (1936), pp. 29-43. 

[3] M. G. Kendall, “A new measure of rank correlation,” Biometrika , Vol. 30 (1938), 

pp. 81-93. 

[4] J. Wolfowitz, “Additive partition functions and a class of statistical hypotheses,” 

Annals of Math. Stat., Vol. 13 (1942), pp. 247-279. 

[5] W. Hoeffding, “A class of statistics with asymptotically normal distribution,” Annals 

of Math. Stat., Vol. 19 (1948), pp. 293-325. 

[6] N. V. Smirnoff, “On the distribution of Mises’ w*-criterion,” (Russian, with French 

summary), Matematicheskii Shornik, Nov. Ser., Vol. 2 (1937), pp. 973-993. 

[7] R. von Mises, “On the asymptotic distribution of differentiable statistical functions,” 

Annals of Math. Stat., Vol. 18 (1947), pp. 309-348. 



ON PREDICTION IN STATIONARY TIME SERIES 

By Herman 0. A. Wold 
Uppsala University 

Summary. In time series analysis there are two lines of approach, here called 
the functional and the stochastic. In the former case, the given time series is 
interpreted as a mathematical function, in the latter case as a random specimen 
out of a universe of mathematical functions. The close relation between the 
two approaches is in section 2 shown to amount to a genuine isomorphism. 
Considering the problem of prediction from this viewpoint, the author gives in 
sections 3-4 the functional equivalence of his earlier theorem on the decom¬ 
position of a stationary stochastic process with a discrete time parameter (see [9], 
theorem 7). In section 5 the decomposition theorem is applied to the problem 
of linear prediction. Finally in section 6 a few comments are made. Since 
various aspects of the isomorphism in question are known, this paper might be 
regarded as essentially expository. 

1. Introductory. Let the sequence 

(1) • • • , Xt-l , Xt , Xt +1 • • • 

be an empirical time series such that no clear trend is present in the average 
level, in the variance or in any other structural properties of the series which we 
might choose to consider. Such series are usually called stationary as distinct 
from evolutive , terms which of course are somewhat loose when referring to 
empirical data. We shall consider two approaches in the theoretical analysis of 
stationary series. It is convenient to allow x t to be complex; the conjugate 
complex of x t is denoted x t . 

In the functional approach, the sequence (1) is regarded as forming an infinite 
sequence, say {x t (, where t runs from — » to + <». To define stationarity, let 
us for any infinite sequence {z t } write 

(2) M\z t ] = lim -- j-j —Ui -* — 00 , fe -*• + 00 )• 

The limit M fcd, which will be called “the average of z”, is clearly independent 
of t. It is also seen that a necessary and sufficient condition for M[z t ] to exist is 
that the same average should be obtained when t x is kept fixed while —> + «, 
and when U is kept fixed while t x —> — «. The stationarity of the sequence (1) 
may now be brought out by assumptions of the type that the averages M[x t ] and 
M[xr$t+k] exist, say 

(3) M\x t ] = m, M[xrxt+k] - r* (fc - 0, ±1, d=2, • • •)• 

In the stochastic (or probabilistic) approach, we introduce an infinite sequence 
of random variables, say 

(4) • • • , , {<+i, • • • 


558 


(-c© < t < +co), 



ON PREDICTION 


559 


or briefly {{ 1 }. The sequence {£/} may be regarded as the generalization of the 
notion of multi-dimensional variable, say [f 1 , • • • , £ n ], to an infinite number of 
components £*•. According to a basic theorem by A. Kolmogoroff (see e.g. [9], 
§11), the probability distribution of the sequence {£<} may be defined by specify¬ 
ing for any finite set of variables, say [£ tl , * • • , £ J, its multi-dimensional dis¬ 
tribution function, say 

(5) F(U!, • • • , u n ; ti , • • • , tn) = Prob (£ tl < Ui ,*•*,£<. < u n ). 

The sequence {£*) thus defined is said to constitute a stochastic process . As is 
sufficient for our purpose, we confine ourselves to the case when the time parame¬ 
ter t is restricted to discrete values, t = 0, ±1, ±2, • • • . 



Now in the stochastic approach, the empirical time series (1) is regarded as a 
sample specimen, a realization, of the stochastic process {£4, just as a point 
[xi , • • • , x n ] in an n-dimensional space may be regarded as a sample specimen 
of a multidimensional variable l£i, • • • , £»]. In line with this interpretation, the 
process {£<} may be regarded as a universe of individual realizations such as (1) 
(see the graph). Taking out a realization at random from this universe, we shall 
have the probability, 

F(u 1 ; h) = Prob (£*, < Ui), 

that the value taken on by the realization at the time point k will be <?/i ; 
similarly, 

F(ui ,U 2 \tx,h) = Prob (£*, < u x , £, 2 < m), 



560 


HERMAN O. A. WOLD 


is the joint probability that the values taken on by the realization at t\ and / 2 
will be <U\ and <ut respectively. 

Any expectation referring to the variables (4) may be expressed in terms of the 
distribution functions (5), for instance 

#[{<] = f ud u F{u\t), #[{<,•{«, 1= f [ u-vdi, v F(u,v;h,k). 

J — OO J—OO J—CQ 

Again interpreting in terms of the universe of realizations, #[£*], say, is the aver¬ 
age, over this universe, of the value taken by the realizations at the time point t. 

The above definition of a stochastic process (4) being perfectly general, we have 
to impose special assumptions if we wish to take into account particular proper¬ 
ties of the given time series (1). Thus stationarity of the process (4) may be 
defined by assuming that any probability of the type (5) will remain the same 
if h , * • • , t n is replaced by h + t, •• • ,t n + t> where t is arbitrary. Alternatively, 
and more generally, the stationarity of the sequence (1) may be brought out in 
this approach by assuming that the expectations 

E[it] ~ /x, E[£ t •£*+*] = Pk 

exist and are independent of t. 

2. The functional and stochastic approaches are closely related as to problem* 
and results. A typical example is that r k and p k as defined above allow the 
representations 1 

(6) r* = £ e ,u dF{\), Pk = f’ e™ d$(X), (k = 0, ±1, ±2, ■ ■ ■), 

where F(\) and $(A) are real, bounded and never decreasing functions. We 
shall now show that the parallelism between the two approaches amounts to a 
mathematical isomorphism. On the one hand, we recall that A. Kolmogoroff 
[3], [4] has introduced and studied the notion of a stationary sequence in Hilbert 
space,—let such a sequence be denoted {X<}—, and shown that a stationary 
stochastic process {&} forms a particular realization of this general, abstract 
{X<). On the other hand the following elementary lemma shows that another 
realization of (X*) may be formed on the basis of a stationary sequence {x<} 
such as (1). 

Lemma. Let {x t J be a sequence of type (1) which satisfies the conditions (3) but 
is arbitrary in other respects. We write 

(7) {*,) = ••• Xt,X, + i, , 

where x t = {sj}, and *<+* is obtained from x t by replacing xt by x t+ t for every t. 

*As to r* , see N. Wiener [8], who treats the case of a continuous time parameter t . 
As to pk , we H. Wold [9], p. 66, and A. Kolmogoroff [4],p. 5. 



ON PREDICTION 561* 

For the elements x t , let multiplication by a real or complex constant and addition 
be defined by 

ax i = {axt}, x t + y t = \x t + y t }, 
and let R be the. class formed by all elements of the type 

C-nXt-n + C- n+ iX t -n+l + • • ’ + Co*t + * * * + C n Xt-n , 

where n and c_ n , • • • , c n are arbitrary. Let the inner product (x t , y t ) of two 
elements x t = [xt\ y y t = {?/<) in R be defined by 

(x t , y t ) = M[x t -yt], 

and let R f be the closure of R. 

Then R' is a space the dimension of which is denumerable or finite. In the 
former case , R* satisfies the conditions of a Hilbert space H, in the latter case it can 
be extended to a Hilbert space H. In any case , the relations 

(8) Ux t = x t+ i , - oo < t < + <*>, 

define a unitary transformation U in H. 

The first statement of the theorem is obvious. It is also easily verified that 
R' satisfies the conditions A-C of an abstract Hilbert space as defined by 
B. v. Sz. Nagy [7]. If R' is of finite dimension, a suitable extension will make R r 
satisfy the conditions A-E of a Hilbert space as defined by M. H. Stone [6]. 
The transformation U is clearly unitary; it is also plain that the definition (8) 
of U extends to the whole of H. 

Now since both (4) and (7) are particular realizations of a stationary sequence 
(X*) in Hilbert space, any theorem on such a sequence {X t | will give, as imme¬ 
diate corollaries, similar theorems on a stationary sequence { x t | of type (1) and 
on a stationary stochastic process j ). Generally speaking, the former corol¬ 
lary will involve averages of one or more functional sequences {a;*}, {y t } y • • • 
over time t y while the latter will involve averages, for fixed t, over the realizations 
of one or more stochastic processes {£*), {?/*}, • • • . 

Let us consider the following problem of prediction in the light of the iso¬ 
morphism established: Suppose the data (1) are known up to t — 1, say for 
l — 1, t — 2, • • • , t — w, what can then be said about x t , or, more generally, 
about x t +k? One approach to the problem is to apply harmonic analysis to the 
given data, and to extrapolate the function obtained up to the time point t + k. 
Another approach, the one which we shall consider, is to approximate x t +k 
directly in terms of the given data. Confining ourselves to linear prediction, 
and making use of n observations, the prediction formula will then be 

(9) pred. £<+* = ao n,k) + a[ n ' k) xt-\ + ai n,k) x t -2 + • • • + a ( n n,k) xt-n • 

The error of prediction, also called the residual, is denoted 

(10) y\+k = xt+k - pred. x t+ k . 



562 


HERMAN O. A. WOLD 


Considering first the functional approach, we apply formula (9) for all f, 
thus obtaining the residuals 


,An,k) 

Vt -1 > 


_.(«»*) 
Vt > 


(n,k) 

yt +1 


In this approach we are led to regard the residual variance, i.e. 

(11) M\ | y\' M | 2 ], 

as a total measure of the accuracy of the prediction. If we follow the stochastic 
approach, on the other hand, the formula (9) is applied, for fixed t, to all realiza¬ 
tions {x t } of the process {£*}• In this case, the variance expectation, 

(12) *11 yi*' w I 2 ], 

is regarded as a total measure of the accuracy of the prediction. The prediction 
coefficients a< n,Af) are determined by minimizing the expressions (11) and (12), 
respectively. 2 It needs no further comment that the two lines of approach in 
prediction theory will, thanks to the isomorphism indicated, lead to parallel 
results. 

In a study of stationary stochastic processes, the author has earlier found a 
decomposition theorem which has a direct bearing on the prediction problem 
(see [9], theorem 7). The main purpose of the present note is to develop the 
corresponding decomposition for a functional sequence of the type (1). Two 
theorems on this line are given in sections 3-4. The proofs are briefly indicated; 
for further details, the reader is referred to my treatment on the stationary 
process [9]. In section 5, the decomposition is applied to the prediction problem. 
A few comments follow in section 6. 


3, Auto-regression analysis of stationary time series. Let {x*} be an infinite 
sequence (1) such that the conditions (3) are fulfilled. By (9)-(10), the resid¬ 
uals y ( t n ' Q) will be well-defined for every n and t. According to elementary 
properties of least square residuals, we have 

(13) lW[y ( < B ’ 0) ] =0; M [yi n ' 0 ) -x ( _*] = 0 for k = 1, 2, • • •, n. 

Since the minimum variance cannot increase if we replace n by n + 1, we further 
have 


M[\x t \ 2 }> M[\ 2 / ( t n,0) | 2 ] > M [ | y ( t n+1 ' 0) | 2 ] > 0. 

Making n —> oo, we infer that there is a constant d 2 such that 

lim M[\ 2/e n,0) | 2 ] = d 2 > 0. 

1 For real sequences {z t \ and {$4, this minimization is, of course, nothing else than the 
method of least squares. 



ON PREDICTION 


563* 


Making use of the Gram-Schmidt orthogonaiization procedure, it is further 
possible to show that there exists a sequence { y %} such that 

lim M[\y ( r 0) - y «1 2 ] = 0 . 


In the usual terminology, the sequence \yt) is the limit in the mean of the se¬ 
quence {y ( ,*’ 0) }, 


(14) 


l.i.m. (• 


,.<*•«> „(".<» „(". 0 ) 

> Vt- 1 > Vi ? Vt+ i > 


* * • 2/<-i > yt > yt+ 1, • • • * 


We may remark that (14) does not necessarily imply that y\ n) will for a fixed 
t have y t for an ordinary limit. We also note that the limiting sequence ( y t ) 
is not uniquely determined; for instance, the relation (14) remains valid if a 
finite number of the elements y t are modified. 

As is easily shown, we have 


(15) lim M[| | 2 j = M[\y, | 2 ] = M[y t -x t ] = d 2 > 0, 

n —* oo 

and [cf. (13)] 

06 ) M[ytx^ k ] = 0 , k = 1 , 2 , • • • . 

Moreover, the sequence [y t \ is non-autocorrelated, i.e. 

(17) M[y t y t +k] =0, k = =tl, ±2, ••* . 

In fact, observing that 

M\y,yt+ k ] = lim M[y ( i"’ >) -y\-k\ k = 1,2, ■■■, 


and supposing that (17) is not true, we would have 
(18) I M[y\ v,0) -y\-k] | > a > 0, 

as v runs through some sequence ni , n* , • • • , such that n,- —> °o. The relation 

(18) , however, would imply 

(19) M[ | y\ v ' 0) - c yfc? | 2 ] < d 2 (1 - W) 

for some sufficiently large v and for some suitable c. Since y\ v '^ — c y { t - 0 k is a 
linear expression of the type appearing in the right hand member of (9), the 
relation (19) is incompatible with (15). Thus (18) is not possible and (17) must 
hold good. 

Part of the above analysis is summed up in 

Theorem 1. Given a time series {x*} which satisfies (3), let e > 0 he arbitrary . 
Then an integer n and a set of coefficients a- n,0) exist for which (9) defines a residuall 
series {y$ n,0) j such that 

M\y\ n *] = 0, | M[y\ n ' 0) -y&£ T| < « k - =fcl, ±2, • - • . 



HERMAN O. A. WOLD 


504 

4. A decomposition theorem. We shall first consider the special case where 
(15) gives 

(20) M[ | y, | 2 ] - d 2 - 0, 
which is the same as 

l.i.m. (• • • y\l?\ y\ n,t \ • • •) = (•••, 0, 0, •• •)• 

n -*oo 

In this case we shall say that the sequence [xt] is deterministic, 3 the interpreta¬ 
tion of this term being as follows: Given the sequence {x t \ for all time points up 
to and including t — 1, we may, by the use of a finite number of the given values, 
predict x t +k with any accuracy; i.e., with a residual error of arbitrarily small 
variance. This can be shown by induction. In fact, suppose that we are able 
to predict each of x t , • • • , x t +k-i in such a way that the prediction error has a 
variance < e, where € is arbitrarily prescribed. Letting 5 > 0 be arbitrary, we 
can then find a formula of type (9) which predicts x t +k in terms of the exact 
values Xt+k-i , x t + k - 2 , • • • and which gives a residual variance 8/(k + 1). 
Replacing here xi+l-i , • • • , x t by values so predicted that the residual variances 
are less than 8/(k + 1) | ai n,0) (,*••, 8/(k + 1) | a* n,0) |, it is seen that the total 
error of (9) will have a variance < 5. 

We proceed to the general case, d 2 > 0. According to the above analysis, 
y t is that part of x t which cannot be linearly predicted from the previous observa¬ 
tions x t - 1 , Xt- 2 , • • • . In other words, each time point t brings in an unpredict¬ 
able, random-like element y t in the series [x ( j. Now while from (16) y t is 
uncorrelated with the previous observations x t ~\, x t - 2 , • • • , it will in general be 
correlated with the future observations x t +i , x t + 2 , • • • . Thus the unpre¬ 
dictable element y t may be regarded as influencing the future development 
x t + 1 , x t + 2 , • • • of the series {a;*}. In order to examine this influence we proceed 
as follows. 

We approximate x t linearly in terms of y t , yt- 1 , • • • , y t ~ n , writing 
Xt = boyt + btft-i + • • • + b n yt-„ + M ( , n) = z\ n) + w)"’. 

Determining the coefficients 6* by minimizing 

M\\x t — 2( n> I 2 ], 

the coefficients 5* will thanks to (16)-(17) be independent of n. We obtain 

bo *» 1; bk = M[xry t -k]/d 2 , k = 1, 2, • • • . 

The sequence {z* n> } thus being determined for every n, it is further easily shown 
that {z ( t n) } converges in the mean, say to \z t }, 

(21) l.i.m. (- • • , zT\ ft, - * •). 


1 The term is due to J. Doob [1 ]; in my study [9] I used the term singular. 



ON PREDICTION 


565 • 


We may thus write 

= Vt + &1Z//-1 + btiji-t + • • • , 

where the sum converges in the mean. Finally, we write 

(22) x t = z t + u t , 

which gives a decomposition of the series {#*) into two components \z t ) and 

M- 

In the decomposition (22) the component z t is that part of x t which is linearily 
built up by the unpredictable elements \y t J up to and including the time point 
t. From (17) we know that the sequence \y t ) is non-autocorrelated. It can 
further be shown that the square modulus sum of the coefficients b k is convergent, 

£ | 6 * | 2 < 00. 

*-0 

As to the component u t , it can be shown that {ix<} is deterministic. More 
precisely, we have 

l.i.m. { u t — (ao n,0) + ai” ,0) W/_i + ••• + a» n, °W-n)} = (0) 
n —* 00 

where the a[ n ' 0) are the same as the minimizing coefficients of (9). It can further 
be shown that u t is uncorrelated with y t+k and z t+k for all k , 

M[u t y t +k] = M[u t zt+k] = 0, (Jc = 0, =bl, ±2, • • •)• 

Summing up the above results, we obtain 

Theorem 2. Any time series {x t } which satisfies the conditions (3) allows the 
decomposition 

(23) { xt } = {zt + ut }, 
with 

\z t \ = l.i.m. [y t + biyt-i + b 2 y t -2 +•• • +b n yt~n\, 

n —»oo 

where the series {$/*}, \z t ] and {u t \ have the following properties. 

A. The elements y t , Zt and u t are obtained from x t , x t - 1 , • • • by the limit for¬ 
mulae (14), (21) and (22). 

B. The series {y<} has zero mean, 

M[y t ] = 0, 

is non-autocorrelated, 

M\y t yt+k] = 0, k = ±1, ±2, • • • , 
and is uncorrelated with {x*-i}, {£<- 3 ), ••• , 

M[yrXt-k] = 0, k = 1, 2, ■ • • . 



HERMAN O. A. WOLD 


6m 


C. The series {u<) is uncorrelated with \y t ) and {z<}, 

M[u$t+k[ — 3f[u^<+*] = 0, (k ® 0, dbl, =b2, • ■ •)• 

D. The series [u t ) is deterministic . 

6. Application to the problem of prediction. In section 1 we have considered 
the problem of predicting x t +h linearly in terms of x t -i , x t - 2 , • • • . Now it is 
seen that theorem 2 gives the following formula for predicting Xt+k with an error 
of minimal variance, 

• pred. xt+k = Ut+h + + bk^yt -2 + • • • • 

In fact, by theorem 2, A and D, the right-hand member can be calculated with 
any prescribed accuracy from a finite set of observations x t ~i , x t ~2 ,•••, x t -M , 
where N of course depends on the accuracy desired; on the other hand, the 
prediction error being 

yt+k + b&t+k-i + • • • + b k y t , 

we infer from theorem 2 (B) that this error is of minimal variance, 

M[ | x l+t - pred x l+k | *) = (1 + | &i | 8 + • ■ • + | b t \ *)d\ 


6. Comments. As mentioned in section 2, the above theorem 2 is the analogue 
of a theorem on the decomposition of a stationary stochastic process given by the 
author previously (see [9], theorem 7). The starting point is then to apply 
formula (9), not as above to the same sequence {z<} for varying t, but to all 
realizations {a;*} of the process, holding t fixed. The close connection between 
the decomposition in the two approaches is further brought out by the following 
theorem. 

Theorem 3. Given a stochastic process , 

•••>*#- 1), f(0. & + !),-••, 

which is stationary in the sense of (5), let {x t } be an individual realization of this 
process. Then will with probability 1 allow the decomposition of theorem 2. 

In fact, according to the ergodic theorem of Birkhoff-Khintchine, 4 the averages 
(2) will exist with probability 1, and so theorem 3 follows from theorem 2. It 
should be observed that the coefficients bk will in general vary from one realiza¬ 
tion to another. 

The theory of the decomposition (23) has been carried further in a brilliant 
study by A. Kolmogoroff [3]. His analysis deals with the general case of. a 
stationary sequence in a Hilbertspace. Establishing a decomposition of type (23) 

4 See A. Kolmogoroff [2]. Hie proof refers to averages (2) of the special type where 
ti is hold fixed while t t «. According to the stationarity, however, the average exists, 
and is the same, when t t is fixed and t x —► — » f and so the general average (2) will likewise 
exist. 



ON PREDICTION 


567 


for such sequences Kolmogoroff also shows that the decomposition is uniquely 
determined by properties corresponding to A-D. Making use of the powerful 
methods of spectral analysis of linear transformations in Hilbert space, Kolmo¬ 
goroff further presents a highly developed theory of the decomposition. 

As immediate corollaries of this general theory Kolmogoroff [4] obtains corre¬ 
sponding results for a stationary stochastic process {} such as (4). Now thanks 
to our lemma in section 2, similar theorems hold good for the functional sequence 
(1). These results include detailed theorems on the connection between the 
decomposition (23) and, on the other hand, the function F (X) which by (6) 
generates the coefficients r* . For example, it turns out that )a:<) is completely 
deterministic if the derivative F'(\) is constant over an interval of positive 
measure. An explicit formula for the coefficients bk in terms of the function 
F(\) may also be obtained. For proofs and further results, we must refer to 
Kolmogoroff^ papers [3]—[4]. 

The theory of the decomposition (23) has later been generalized in various 
directions. V. Zasuhin [11] and J. Doob [1] have shown that the decomposition 
applies to multi-dimensional stationary sequences. As shown by the present 
author [10], the decomposition may be employed for the analysis of linear equa¬ 
tion systems with an infinite number of unknowns. This device makes use of 
the decomposition of non-stationary sequences, a generalization indicated also 
by M. Lofcve [5]. 


REFERENCES 

[1] J. L. Doob, “The elementary Gaussian processes,” Annals of Math. Slat., Vol. 15 

(1944), pp. 229-282. 

[2] A. Kolmogoroff, “Ein vereinfachter Beweis des Birkhoff-Khintchincschen Ergodcn- 

satzes,” Rec. Math. ( Sbornik) N.S. , Vol. 2 (1937), pp. 367-368. 

[3] A. Kolmogoroff, “Stationary sequences in Hilbert space (Russian),” Bolletin Mos- 

kovskovo Gosudarstvenovo Universileta, Matematika, Vol. 2 (1941). 

[4] A. Kolmogoroff, “Interpolation und Extrapolation von stationciren zuf&lligen 

Folgen,” Bull. Acad. Set. URSS , Sir. Math., Vol. 5 (1941), pp. 3-14. 

[5] M. LoAvk, “Fonctions aldatoires de second ordre,” Revue Set., Vol. 84 (1946), pp. 

195-206. 

[61 M. H. Stone, Linear Transformations in Hilbert Space, Amcr. Math. Soc. Colloq, 
Publ. 15, New York, 1932. 

[71 B. v. Sz. Nagy, Spektraldarstellung linearer Transformationen des Hilbertschen Raumes , 
Ergebn. Math. u. Grenzgeb., Vol. 5, Berlin, 1942. 

[8] N. Wiener, “Generalized harmonic analysis,” Acta Math., Vol. 55 (1933), pp. 117-258. 

[9] H. Wold, A Study in the Analysis of Stationary Time Series, Dissertation (Stockholm), 

Uppsala, 1938. 

[10] H. Wold, “On infinite, non-negative definite, Hermitian matrices, and corresponding 

linear equation systems,” Arkiv Mat. Astr. Fys., Vol. 29A (1943). 

[11] V. Zasuhin, “On the theory of multidimensional stationary random processes,” 

C.R. ( Doklady) Akad. Sci. URSS, N.S., Vol. 33 (1941), pp. 435-437. 



GENERALIZATION TO N DIMENSIONS OF INEQUALITIES OF 
THE TCHEBYCHEFF TYPE 


By Burton H. Camp 
Wesleyan University 

1. Summary. The Tchebycheff statistical inequality and its generalizations 
are further generalized so as to apply equally well to n-dimensional probability 
distributions. Comparisons may be made with other generalizations [1], 12] 
that have been developed recently for the two-dimensional case. The inequal¬ 
ities given in this paper are generally as close as the most favorable corresponding 
inequalities that exist for the one-dimensional case and in many simple cases 
they are closer than those that have been given heretofore for two dimensions. 
In a special case the upper bound of our inequality is actually attained. The 
theory contains also a less important generalization in one dimension. 

2. Introduction. It is necessary to introduce a new kind of moment, to be 
called a “contour” moment, which is a generalization of the usual one-dimensional 
moment. If we consider first a simple two-dimensional frequency surface, 
y = f{ti , U), we may think of y as a function of a single variable, x, where x is the 
area of the contour on that surface at the y level. This function may be defined 
so that it is monotonic decreasing and has other simple characteristics. Then 
we define the rth contour moment as 

Mr = xy dx, 

and then the generalization of the Tchebycheff-type inequalities follows easily. 
This theory can be applied equally well to almost any single-valued function of 
n variables which is limited and integrable in the sense of Lebesgue. Therefore 
the theory will be enunciated initially in a very general form. The reasons for 
the initial statements will be indicated only briefly because a detailed discussion 
of quite similar ideas has been given by this author in another paper [3], where 
he applied the same general principle to obtain generalizations of certain theo¬ 
rems in integration theory. 

3. Preliminary theory. Let f(h , • • • , U) be a probability distribution with 
limited upper bound L and defined at all points of infinite n-space, which is to be 
denoted by 5P, dT being the Lebesgue measure of a differential element. We 
thus assume that: 0 £ /(f, • • • , t n ) £ L,f has a Lebesgue integral in T 7 , and 

/ fdT = 1. 

Let Qx denote the set of points in T where/ > X, (0 ^ \ ^ L), and let X\ be the 

568 



GENERALIZATION OF INEQUALITIES 


569 


measure of Q \, for Qx is known to be measurable. Therefore x L = 0, x 0 2 * 00 , 
and for each X there exists a unique Q\ and therefore a unique x\ . This means 
that a; is a single-valued function of X and that it exists (or is positive infinite) 
for every value of X in the interval (0 ^ X ^ L). If X' > X, ^ x\ . This 
means that s is a monotonic decreasing function of X. It need not be continuous; 
that is, it may be asymptotic to the line X = 0, and it may have finite discon¬ 
tinuities or “jumps”. Also there may be an enumerably infinite number of X 
intervals in which x is constant. It follows that X is a monotonic decreasing 
function of x in the interval (0 ^ x ^ xo ^ «), but it may not exist (in intervals 
where x has jumps), and it may be multiple valued (at points where x is constant). 
We now let y(x) = X*, except that: if X is multiple valued at any point x we 
let y have the minimum value of X at that point. Any other value would do 
equally well because the total measure of such points is zero and they can be left 
out of the integrals that follow. If X does not exist in an x interval, we let y have 
in that interval the value which it has at the beginning of the interval. This is a 
X point where x has a jump. We have thus defined y as a single valued mono¬ 
tonic decreasing function of x in the interval (0 x ^ x Q S 00 ) and 0 ^ y ^ L. 
It follows from Lebesgue’s theory that: 

f y(x) dx = f f dT, (0 < X ^ L); f y(x) dx = f f dT = 1. 

Jo J q x Jo J t 

Finally we restrict our function / so that there shall be at most a finite number 
of points x where X is multiple valued (intervals of X over which x is constant), 
and hence the number of discontinuities of y will be finite. This restriction may 
not be necessary but it is convenient and not embarrassing in applications. 

4. Contour moments. The rth contour moment is denoted by j*,. The con¬ 
tour standard deviation is denoted by a. We define 

Mr = X r y dx. 

Jo 

It follows that mo = 1, and that 

0 r° 

£2 = S- 2 = / x 2 y dx. 

Jo 

We shall also let 5 2r = M 2 r/<r 2r . We now assume that r is either zero or a positive 
integer, but in much of what follows this assumption is not necessary. 

Example 1. Let }{k , h) = (2ir) _1 c _<<J+ ' J),2 . 2 The equation, f{k , fe) = A, 
defines a circular contour whose area is x = 7r( t\ + t 2 ) = —2 w log 2?rX. Hence 
y = X = (2r)~ 1 e~ xliir } and 

Mr - [ x r y dx — (2?r) r r!, a = 8tt 2 , a 2r = (2r)!/2 r . 

Jo 

5. Contour moments and one-dimensional moments. If n = 1 and if f(ti) = 
/( —fi), then 

/**0 /• (®o /2) 

M2r = / X? r y dx = 2 I (2 1) f(t) dt = M2f * 2 , 

Jo Jo 



570 


BURTON H. CAMP 


where m T is an ordinary moment. Hence also a — 2 <r, a 2r = far/a 2r = Hir • 2 2r / 
a r • 2 2r — a 2r . It is to be noticed that, although a 2 r = <* 2 r, M 2 r ^ Mar • One 
could alter the definition so that these two moments would be equal by inserting 
into the definition of contour moments the factor 2 n , using x/2 n in place of x, 
but this would introduce a slight complication for a doubtful advantage. Al¬ 
though it would seem to be desirable to define the even contour moments fa r 
so that they would become the ordinary moments /x 2r in the symmetrical one¬ 
dimensional case, such a definition would not make the two corresponding odd 
moments equal, and it would not make the two even moments equal in the non- 
symmetrical one-dimensional case. So it seems better not to introduce this 
factor 2 n , but to take note of the relationships that hold in the one-dimensional 
case. 

Theorem. Let 



dT , 


where \ is such that x\ = 8a. Then 


1 - Pi ^ OL2r 



2r + l\ 2r 
2r / 


Corollary 1. In particular 1 — P« g a 2r /8 2r . 

Corollary 2. If r = 1, 1 — Pa ^ 4/9 8 2 . This theorem and these two 
corollaries are minor generalizations even of the corresponding one-dimensional 
inequalities, for it is no longer assumed that the probability distribution f(t) 
has but one mode. 

Proof of Theorem. Let g(x) = y(x) if 0 £ x ^ x Q ^ <», let g(x) = y(—x) 
if — oo g — xq ^ x ^ 0, and let g(x) = 0 elsewhere in (— «>, oo). Then g(x) 
has all the properties explicitly required of f(x) in a former paper by this author 
[4] in which this theorem was proved for the one-dimensional case. That is: 
g[x) is a frequency function whose mean is zero, and 

/ g(x) dx = 2, and / g(x) dx 
00 J 


is the probability that \ x\ > 8a; g(x) is a monotonic decreasing function of 
| x | for all values of x; and is symmetrical with respect to the central ordinate. 
Therefore, transforming the symbols of that paper to our present notation, we 
have 

I g(x)dx £<*„/($■ 

where 


x 2 y dx -- a 2 . 



GENERALIZATION OF INEQUALITIES 


571 


Similarly = fa , a 2r — fa , and finally 

pi* /»°o 

1 — Pi = 1 — / y dx = 1 — / gdx = I g dx 
Jo Jo Jbv 

This proves the theorem except that there is one exceptional case that requires 
attention. In the proof of the theorem in the paper just referred to the author 
assumed that the function corresponding to our present g(x) was continuous. 
At that time a “frequency” function was often thought of as determined by a 
smooth curve approximating a histogram and implied even the existence of 
derivatives, and so continuity was not added to the explicit requirement that 
the function be a “frequency” function, but this condition was explicitly intro¬ 
duced in the lemma on which the proof of the theorem was based, and so we do 
now have to consider separately the case where y, and hence g, may have a finite 
number of jumps. It is quite easy to handle this case as the limiting form of a 
continuous case. In that lemma it was also required that d 2 Q/dt 2 should exist 
and be non-negative, which would imply that we now have to make the require¬ 
ment that y (corresponding to dQ/dt) shall have a non-negative first derivative. 
On examination of the proof, however, it will be observed that this is not neces¬ 
sary, since y is monotonic decreasing and continuous. That is, in the lemma the 
only use made of the condition, d 2 Q/dt 2 ^ 0 , was that the function Q(t) should 
determine a curve which would be never concave down. But for this it is 
sufficient that dQ/dt be continuous and monotonic increasing, and these condi¬ 
tions are now satisfied by the function which plays the rdle of Q in the present 
discussion. This function will now be defined as 

y(x)dx. 


Let y(x) be a continuous function defined as equal to g(x) except in the neighbor¬ 
hood of the points of finite discontinuity. Near such points it is to be so de¬ 
fined that it shall have all the properties just required of g{x), and in addition 
so that, for any prescribed R > 1 and e > 0, 

J x 2r y(x) dx — J £ x 2r g{x) dx + rj r , (1 £ r £ R ); 

y(x) dx = £ g(x) dx + rj, 


where 1 17 , 1 ?, | < €. It is obvious that such a definition of y may be made in 
many ways, and one of them is by making use of a linear function in the neigh¬ 
borhood of each point of discontinuity. Since ?(x) now satisfies all the condi¬ 
tions of the author’s earlier paper the corresponding inequality is true: 



2 r+ 


2 r 





572 


BURTON H. CAMP 


where 


Hence 


<ti = J x 2 y dx . 


(L g dx - v ) ( s - -£-) ( * 2 - v,y - * 

Let « approach zero and we have, as desired: 

1 - ft £ 4, / (» • 2r -±- 1 ) 1 ’. 

Example 2 . Let 


M2r ~ Vr* 


f(ti , • • • , k) = -1 exp 


{4 


• + *)} 

i 2 

\<Tl 

cr n /) 


A = (2w) n, \<ri • • • crn) \ 


This is a form into which the general correlation solid may be put by means of a 
linear transformation. Since Ps is a ratio between two parts of such a solid and 
since this ratio is preserved under a linear transformation, the more general case 
may be transformed into this one, or even, as will appear shortly, into the simpler 
one where all the standard deviations are unity. If / = X the contour is the 
ellipsoid, 


i 


+ 


i 2 \ 

+ -i= — 2 log ^j-. 


The volume of this ellipsoid is 
x = h( — 2 log \/A) n '\ 


n/2 


h = Focr, 

Hence y - Aer Mk)1,n y £ r = f x r y dx 

Jo 


Vn , V 0 


Vo = 


2rr L 

nY (n/2) 


»/2rt»/2 fl 
7 T Z <X\ 


nAh r+1 2 nm,+1) ~ i T = ( 

Putting r = 2 we obtain 

<r„) J r(3»/2) 


(nr + n\ 

-P 2 ' 


[r(n/2)] r 


= ^ 2 n+J (<r l 


n s 


[r(n/2)] ; 


3 ’ 


and then 


r (2m + n\ 

& A 0[ r W 2 )T 

' - f(n/2) Lr(3n/2)J- 





GENERALIZATION OF INEQUALITIES 


573 


Our inequality becomes: l — Pi £ J, where 


(f-H- 1 ) 


2 r , or 1 , whichever is smaller. 


Typical numerical values of a ir and of J are given in Tables I and II. 


TABLE I 


Values of a 2r 


n 

au 

at 


at 

1 

1 

A 

"W' 

ci 

! 

1 

3 

15 

2 

( 2 r) !/ 2 r 

1 

6 

90 

3 

3-5-7---(6r + l)/(3-5-7) r 

1 

12.26 

566 

4 

(4r + l)'/(5!) r 

1 

25.20 

3604 


TABLE II 
Values of J 


» 

n 

r 

J 

1 

1 

1 

0 444 



2 

1.000 

1 

2 

1 

0.444 



2 

1.000 

2 

1 

1 

0.111 



2 

0.077 



3 

0 093 

3 

1 

1 

0.049 



2 

0.015 



3 

0.008 



4 

0.006 



5 

0.006 

3 

2 

1 

0.049 



2 

0.030 



3 

0.049 

3 

3 

1 

0.049 



2 

0.062 

■ 


3 

0.308 




574 


BURTON H. CAMP 


Let us now compare J with the true value of (1 — Pi) in one of these cases, 
viz., when 5 = 3 and n = 3. The true value is given by 

1 - P, = 1 - A r <f dx, 

Jo 

where now a = 4t \/i05(<rio-2O'3)/3, h = 4v((Ti(T20z)/3. The integral may be 
evaluated by means of the transformation, t = (x/h) Vi and a table of the integral 
of (2 t)” 1/ V~ , * /2 (< 2 — 1). We obtain: 1 — P 3 = 0.0205. This is the true value 
to be compared with the approximation, J = 0.049. The closeness of this 
approximation is similar to that which may be obtained for the normal law by 
using the corresponding inequalities for one dimension. To illustrate this we 
find from the usual tables that, if for the normal law 1 — Pt = 0.0205, S = 2.32. 
Hence the corresponding inequality is (for r — 2): 1 — Pi £ 0.042. 

We shall now show that the upper bound of our inequality is actually attained 
in a special case. Let/(<i, • • • , t n ) — 2~ n in the region ( — 1 g t x , • • • , U ^ 1), 
and let / = 0 elsewhere. For this case we shall have x — 0 when X = 2“ n , and 
x = 2 n when 0 ^ X < 2“\ Therefore y = 2~ n if 0 ^ x < 2 n , and y = 0 
if 2 W g x. Hence a = 2 n /y/3 , /*> = 1, and the true value of (1 — Pi) is 
1— 6 /a/ 3; and when 5 = 2/y/3 , this true value is 1/3. The appropriate in¬ 
equality is: 1 — Pi ^ 4/9 6 2 , and when 5 = 2/\/3 the right hand side of this 
inequality is also equal to 1/3. These relationships are true for all values of n. 

REFERENCES 

[1] P. O. Berge, “A note on a form of TchebychefFs inequality for two variables,” Bio - 

metrika , Vol. 29 (1937), pp. 405-406. 

[2] Z. W. Birnbaum, J. Raymond, and H. S. Zuckerman,” A generalization of Tcheby- 

cheff's inequality to two dimensions,” Annals of Math. Stat., Vol. 18 (1947), pp. 
70-79. 

[3] B. H. Camp, “A method of extending to multiple integrals properties of simple in¬ 

tegrals, n Math. Ann., Vol. 75 (1914), pp. 274-289. 

[4] B. H. Camp, **A new generalization of Tchebycheff’s statistical inequality,” Amer. 

Math. Soc. Bull., Vol. 28 (1922), pp. 427-432. See also: U A note on Narumi’s 
paper,” Biometrika , Vol. 15 (1923), pp. 421-423. 



BOUNDARIES OF MINIMUM SIZE IN BINOMIAL SAMPLING 

By R. L. Plackett 
University of Liverpool 

1. Introduction. Much attention has recently been concentrated on the prob¬ 
lems arising when sampling a binomial population, since this is thought to form a 
suitable model for certain industrial and biological procedures. A general 
discussion of such procedures as applied in industry has been given by Barnard 
[ 2 ] and various particular cases have received detailed treatment by Burman [3] 
Stockman and Armitage [ 6 ], and Anscombe [ 1 ]. Unbiased estimation of the 
population parameter (the “fraction defective”) has been investigated by 
Girshick, Mosteller and Savage [4] and Wolfowitz [7]. A paper by Haldane [5] 
is also relevant. 

For such sampling procedures it is necessary to find the probabilities of accept¬ 
ing or rejecting material with a particular fraction defective; to calculate the 
average sample size; and to form an estimate of the fraction defective when 
sampling terminates. All three characteristics may be expressed in terms of 
quantities N(x , y), defined in section 3, so that once these are known, the funda¬ 
mental properties of the scheme are known. 

Here we present a method for determining the N(x , y); investigate the condi¬ 
tions under which it is valid; relate the method to the estimation problem; and 
exemplify its application. The schemes to which the method can successfully 
be applied are of a special type (to which the title refers) and include all inspec¬ 
tion procedures with a finite upper limit to the sample size likely to be used in 
practice. Other schemes, when dissected in a manner similar to that used by 
Stockman and Armitage, can doubtless be formulated as an aggregate of the 
special types. 

2. Nomenclature. Our nomenclature differs in some respects from that of 
Girshick, Mosteller and Savage, although the same collection of terms is em¬ 
ployed. References to their paper should therefore be followed by a comparison 
of the terminology. 

Taking a sample of one from a binomial population consists in observing either 
of two events, whose probabilities are p and 1 — p (p ^ 0 or 1 ). The results 
of successive samples of one can be represented by the path of a particle in a two- 
dimensional lattice of points with non-negative integer co-ordinates. This 
particle starts at the origin 0 and at any point (a;, y) travels to (x + 1, y) if the 
event whose probability is p has occurred, otherwise to (x, y + 1 ). Sampling 
terminates when the particle reaches a boundary point , and the set of such 
points is denoted by B. Any point which can be reached during sampling, 
including the boundary points, is accessible y and any path from the origin to a 
point B which can be traversed during sampling is admissible ; all other points 
are inaccessible and all other paths inadmissible . The index of a point is the sum 
of its coordinates. 


575 



576 


R. L. PLACKETT 


It will probably help to note in particular that whereas Girshick, Mosteller and 
Savage used p to correspond to events causing the y co-ordinate to increase, we 
use it for x. 

3. Determination of N(x, y). The set B determines the sampling scheme and 
we are concerned with schemes in which all points of index greater than n, 
the finite maximum index of points in By are inaccessible. This condition guaran¬ 
tees that if N(z t y) denotes the number of admissible paths from the origin to 
a point (s, y) of B 

£ N(x, y)p x ( 1 - p) v m 1 , 

B 

the summation being over all boundary points. Consequently, to determine 
N(Xy y) equate coefficients of p in this identity, the coefficient of p° in the left 
hand side being 1 and all others zero. When all the N(x, y) are known, the 
probability of reaching any subset of B can be calculated and the characteristics 
of the scheme found. 

Sometimes it will be convenient to use 

D N(x, y)q\ 1 - qT = 1 , 

B 

where q = 1 — p, but the resulting set of equations cannot be independent of the 
first set since if 

2 a,y 3 £ bid - p)’, 

tmmO jm.0 

then 

The polynomial in either p or q is of degree n; the application of this method 
alone is therefore limited to boundaries containing at most (n + 1 ) points, other¬ 
wise the number of unknowns exceeds the number of equations for them. 

4* Properties of the boundary. 

Theorem 1. If n is the maximum index of points in B and if any point of 
greater index is inaccessible, then B contains at least n + 1 points . 

There must be at least two boundary points of index n for any such point 
(a* , b n ) must be approached from (a» — 1 , &„) or (a„ , 6 « — 1 ); in which case 
either (a n — 1, b H + 1) or (o n + 1, b n — 1) is a boundary point. Let P be any 
one of these points. At least one admissible path exists from 0 to P; suppose 
one such path to consist of the points (ao, bo), (ai, bi) , • • • , (a*, 6 n ) where 
a* + b h =» k (k =» 0,1,2, • • •, n). It is clear that one or more boundary points exist 
on the line x — a h , having y > bk , for otherwise the particle could travel indefi¬ 
nitely along this line; similarly one or more exist on y * b k with x> a k ; and if 



BOUNDARIES OP MINIMUM SIZE 


577 


there is just one on each they cannot be identical unless k = n since (a* , 6 *) is 
not then a boundary point. Initially (cto , bo) contributes two boundary points; 
since then either a*+i = a* and fo+i ^ 6 * or a*+i 7* a k and bh+i * h it follows 
that each succeeding point up to and including (a n _ 1 , bn- 1 ) contributes at least 
one more; the point (a n , b n ) is counted as soon as x reaches a n or y reaches b n , 
whichever occurs first. Consequently there are at least n + 1 boundary points. 

Reversely, if the boundary contains n + 1 points whose maximum index is 
m, such that any point of greater index is inaccessible, then m < n. For suppose 
m > n and apply the preceding result. 

An important class of boundaries therefore comprises those with the minimum 
number of points necessary to attain a given maximum index; they may con¬ 
veniently be termed boundaries of minimum size and for them alone the method 
of equating coefficients yields the number of equations equal to the number of 
unknowns, the first being otherwise less than the second. 

If there are exactly n + 1 boundary points then (a x , 50, fo, &j), ■ • •, (a n -i, &»-i) 
must each contribute to just one; since a k +i = a* or a* + 1 there is one 
point of B on each of the lines x = 0, x *» 1, • • • , x *= a» and this set of points 
(0, do)(l, di), • • • , (o„ , b n ) can be denoted by U, the upper part of the boundary. 
Clearly d k +1 > d* — 1 for otherwise more than one boundary point is required 
on the line x = k + 1. Similarly, there must be a second group of points of B 
(co, 0), (ci, 1 ),•••, (a n , b n ) with c k+ 1 > c* — 1 forming the lower boundary L; 
and all (n + 1) points have now been enumerated, the point P belonging to both 
U and L. The characteristic of such sets B is that the sequences U and L both 
have monotonically non-decreasing index; the special case of sequences with 
monotonically increasing index provides the rejection and acceptance boundaries 
of non-rectifying industrial inspection procedures. (The difference between 
rectifying and non-rectifying procedures is clearly stated in the introduction to 
Anscombe [1]). 

Theorem 2. For boundaries of minimum size any two accessible points not in B 
of the same index m cannot be separated on the line x + y = mby boundary or in¬ 
accessible points In the terminology of Girshick, Mosteller and Savage the 
accessible points not in B form a simple region. 

Let Q(xi , yi) and R(x 2 , yf) be any two such accessible points of index m and 
suppose xi < X 2 . There are two possibilities: ( a m , b m ) does or does not lie be¬ 
tween Q and R. 

(i) ( a m , b m ) lies between Q and R, i.e. x x < a m < x 2 . In this case there must 
be points of B at Q'(x \, 7i) with Y x > y x and at R'(X %, y%) with X 2 > x %. The 
boundary from Q f to P and from R' to P has non-decreasing index; hence all 
points of U on the lines x * x \, x « x\ + 1 , • • • , x * a m — 1 have index at 
least x\ + Y\> m \similarly all points of L on the lines y * y 2 r y — y% + I, • • • , 
y « b m — 1 have index at least X% + y 2 > m. By definition of the boundary 
there are no additional points of B on either group of lines between the path OP 
and the line x + y « n, so the proof of the theorem is completed. 

(ii) If X\ > a m or x 2 < a m the proof is precisely analogous to that given in (i). 



578 


R. L. PLACKETT 


5. Justification of the method. Theorem 3. For boundaries of minimum size 
the equations for N(x, y) are soluble and of rank n + 1. 

To prove this we give a general method of solution for the system of equations, 
using powers of p and q alternately: as already remarked, this is equivalent to 
using the equations from the coefficients of powers of p only. In the first place, 
note that the coefficient of q u is a linear combination of numbers N(x,y) with 
x + y > u and y < u; and the coefficient of p x has x + y > t and x < t. 

Let 8 = Min(do, di , d %, • • • , b n ) — 1. 

Then from the coefficients of q °, q 1 , • • • , q* can successively be determined 
N(co,0) N(d,l), ••• ,N(c, ,s), the matrix of the equations being triangular with 
ones in the main diagonal. The points in U at (n , s + 1), (r 2 , s + 1), • • • now 
appear in the coefficients of q ** H , q a+2 , • • • and complicate the solution. 

Let r = Max(ri, r 2 , • • •). 

If either (r, d r ) or (c,, s) is the point P then all the remaining N(x, y) can 
successively be determined from the coefficients of powers of p when the values 
of N(cq , 0), N(c !, 1), • • • , N(c, y s) are substituted in the equations. Otherwise 
the path OP for y > s + 1 must have x > r + 1 so that all points of L on y > 
8 + 1 have x > r + 2 i.e. any point of L on x = 0, x = 1, • • • , x ■» r has 
y < s; for such points the number of admissible paths is now known. Therefore 
from the coefficients of p°, p l t • • • , p can successively be determined N( 0, do), 
N(l, di), • • • , N(r, d r ), the matrix of these unknowns being again triangular; 
in particular N(n , s + 1), N(r 2 , s + 1), • • • can now be found. 

Let Si = Mm (d r+ i, d r+2 , • • • , b n ) — 1, so that Si > s. The coefficients 
of g #+1 , q' +2 > ••• , q* 1 give successively N(c a + 1,3+1) N(c , +2 , s + 2), • • • , 
N(c ,i, si); for the points in U at (r u , Si + 1), (r i2 , s x + 1) • • • . Let 

n = Max (ru , r 12 , • • •)• 

Since there is only one point of U on each line x — constant, n > r . As 
before, if either (r x , d ri ) or (c #1 , si) is P the remaining points of U are soon deter¬ 
mined. Otherwise the process continues and there result an increasing sequence 
of points of L and a similar sequence for U ; the process terminates when 
(On , b n ) has been reached in both, when all N(x, y) will have been found. 

It is clear that for particular cases alternative methods of solution will prove 
more convenient. 

6. Connection with estimation. Suppose that the point (t, u) is accessible and 
let N*(x , y) be the number of admissible paths from (t, u) to ( x , y) where ( x , y) 
is in B. Then Girshick, Mosteller and Savage have shown that N*(x y y)/N(x, y) 
is an unbiased estimate of p*(l — p) u ; and a necessary and sufficient condition 
for it to be the unique unbiased estimate is that the accessible points not in B 
form a simple finite region. Hence from theorem 2 such estimates are unique 
for schemes with boundaries of minimum size. An alternative proof is given by 



BOUNDARIES OF MINIMUM SIZE 


579 ' 


considering that if two unbiased estimates of any function of p exist and f(x, y) 
is the difference between them at ( x , y) 

12 fix, y)N(x, y)p x ( 1 - p) v m o, 

B 

where/(x, y) is not everywhere zero. The equations formed by equating coeffi¬ 
cients have rank (n + 1) as shown by Theorem 3, so that the only solution is 
f(x, y)N(x , y) — 0. Since each N(x, y) is certainly positive it follows at once that 
f(x y y) — 0 and there can only be one unbiased estimate. 

7. An illustration. As an application of the method we take the interesting 
rectifying sequential inspection scheme discussed by Anscombe. The boundary 
points are at (. H , 0), (H + b, 1), • • • (H + yb, n), where n is the greatest integer 
less than (N — H)/(b + 1), and thereafter on the line x + y = N. The equa¬ 
tions for N(x, y) take here their simplest form, namely equation (4) of Barnard's 
paper. From the coefficients of q, q l 9 • • • , q v , • • • , 

1 - N(H, 0); 

0 - N(H + b, 1) - HN(H, 0) whence N(H + b, 1) = H ; 

0 = mu + 26, 2) - + ( 2 ) whence N(H + 2b, 2) 

H(H + 2b + 1 
2! ’ 

• -W + M- (" + *) » - (" + b ) H + (f); 

whence N(H + 36, 3) = H( ' H + 3t + 2)(f/ + 36+ 1) _ 

It now appears reasonable to guess the general term as 

-(H + yb + y- 1 )(.H + yb + y - 2) • • • (H + yb + 1). 
y\ 

The proof is therefore complete if we show 
(H\ (H + b\ tJ , (H + 2b\ H(H + 26 + 1) 

W'U-i)" + \y-») -5i- 

(H + 2b\ H(H + 36 + 2 )(H + 36 + 1) 

\y~3j 3! 

+ ... + ( _D* £_+»_“ C (g + »6 + y-2)--. (H_±yb± 1) = Q 



580 


R. L. PLA.CKETT 


Put (6 + 1) * £, and the left hand side becomes 

(H- 1)! (tf + *-l)! ^ (tf + 2f-l)! 

(AT - y)\y\ (H + { -y)l(y- 1)!1! 1 (// + 2{ - y)l(y -2)121 

/_ lV , (g + yj- l)! 

^ ^ (»+ « -yW 

which is y times the coefficient of t H ~ v in (1 + X [(1 + 0~~ f — r*] 1 '. 

Rewriting the latter as (1 + f) //_1 U — (1 + r 1 )*]*, it becomes clear that the 
highest power of t is t H ~ v ~ l , whence the required result follows. 

REFERENCES 

[1 ] F. J. Anscombe, “Linear sequential rectifying inspection for controlling fraction de¬ 
fective, n Roy. Stat. Soc. Jour, {supplement), Vol. 8 (1946),pp. 216-222. 

[2] G. A. Barnard, “Sequential tests in industrial statistics,” Roy. Stat. Soc. Jour, {sup¬ 
plement), Vol. 8 (1946), pp. 1-21. 

13] J. P. Burman, “Sequential sampling formulas for a binomial population,” Roy. Stat. 
Soc. Jour. ( supplement ), Vol. 8 (1946), pp. 98-103. 

[4] M. A. Girshick, F. Mosteller, and L. J. Savage, “Unbiased estimates for certain 

binomial sampling problems with applications,” Annals of Math. Stat., Vol. 17 
(1946), pp. 13-23. 

[5] J. B. S. Haldane, “On a method of estimating frequencies,” Biometrika , Vol. 33 (1946), 

pp. 222-225. 

[6] C. M. Stockman and P. Armitage, “Some properties of closed sequential schemes,” 

Roy. Stat. Soc. Jour, {supplement), Vol. 8 (1946), pp. 104-112. 

[7] J. Wolpowitz, “On sequential binomial estimation,” Annals of Math . Stat., Vol. 17 

(1946), pp. 489-492. 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


NON-PARAMETRIC TOLERANCE LIMITS 1 

By R. B. Murphy 
Princeton University 

1. Summary. In this note are presented graphs of minimum probable popu¬ 
lation coverage by sample blocks determined by the order statistics of a sample 
from a population with a continuous but unknown cumulative distribution func¬ 
tion (c.d.f.). The graphs are constructed for the three tolerance levels .90, 
.95, and .99. The number, m, of blocks excluded from the tolerance region runs 
as follows: m = 1(1)6(2)10(5)30(10)60(20)100, and the sample size, n, runs from 
m to 500. 

Thus the curves show the solution, 0, of the equation 1 — a = 
Ip(n — m + 1, m) for a = .90, .95, .99 over the range of n and m given above, 
where I x (p y q) is Pearson’s notation for the incomplete beta function. 

Examples are cited below for the one- and two-variate cases. Finally, the 
exact and approximate formulae used in computations for these graphs are given. 


2. Introduction. Suppose a sample of size n is drawn from a population hav¬ 
ing a continuous cumulative distribution function (c.d.f.), F(x). Let the sample 
values arranged in order of increasing magnitude be Xi , x 2 , • • • , x n . The frac¬ 
tion, u f of the population which is included between x r (the r-th smallest value 
in the sample) and £ n -*+i (the s-th largest value) is F(x n -,+0 — F(x r )- This 
quantity u has been called the population coverage for the interval ( x r , x n -«+i). 
The probability element for this coverage is 


( 2 . 1 ) 


f(u) du = 


T(n + 1) 


r(n — m + l)r(m) 


"(1 - nT- 1 du 


where m = r + s. From (2.1) we can calculate the probability that this coverage 
is at least a given amount, say 0. If we call this probability a, we have 


( 2 . 2 ) 



The quantity a is the probability that 100/3% of the population will be included 
between x r and # n -«+i, and it is called the tolerance level. This probability de¬ 
pends only on n and m ( = r + s). 


1 All computations involved in this paper were carried out under an Office of Naval Re¬ 
search contract. 


581 



582 


R. B. MURPHY 


The idea of coverage is more general than it first appears. If we think of 
x\ , x% , • • • , x n as points plotted along the z-axis, we will then have n + 1 
intervals: (— », xi), (zi , &), • • • , (s n , + °o), which, following Tukey [3], we will 
call blocks. The reason for this term will be clear when we deal with the case of a 
sample from a population of more than one variable. The coverage for the i-th 
block ( Xi , 3i+i) is F{xi+i) — F(xi). The probability element of the sum of the 
coverages of any preassigned group of n — m + 1 blocks is given by (2.1) and 
hence the probability a that the fraction of the population covered by any 
n — m + 1 blocks is given by (2.2). By preassigned blocks we mean ones desig¬ 
nated by order statistics prior to obtaining any sample from which a prediction is 
to be made with these blocks. In general it is not legitimate, after taking a sample 
and for some reason evident only then, to specify which blocks in this sample are 
to be included or excluded from the coverage. There is no objection, however, 
to specifying a scheme of blocks for the coverage on the basis of past samples 
when the scheme is to be applied to future samples. 

The purpose of this note is to present graphs of 0 as a function of n for m — 
1 (1)6(2)10(5)30(10)60(20) 100 and for a - .90, .95, .99. There are three figures: 
Figure 1 gives curves for a = .90, Figure 2 for a = .95, and Figure 3 for a = 
.99. The graphs are accurate to at least two decimal places but never more than 
three. In terms of the Pearson notation (2.2) gives, after minor alternation, 
1 — a = 7(9 (n — m + 1, m). Hence these graphs may also be used to find 
the 10, 5 and 1 per cent points of a variate X (0 < X < 1) with the c.d.f. 7 x (p, q) 
for 1 < v < 500 and 1 < q < 100. 


3. Computations for the graphs. If in the relation (2.2) three of the argu¬ 
ments a, /3j m, and n are given, the solution for the fourth may often be found 
in Pearson [5] or Thompson [6]. The values of /3 through n = 100 were com¬ 
puted exactly for these graphs. For larger n, (3 was computed approximately 
from 


\/ (x« — 2m) 2 + 16n(n — m) — (x« — 2m) 


where x« is determined by the relation 


Pr( x 2 £ Xa) = 1 


and has 2m degrees of freedom. This approximation is due to Schefte and Tukey. 
For large m the Comish-Fisher approximation to \ a was used. 


4. Illustrations of the one-variate case. The most common use to which 
the graphs presented here may be put is in the prediction of /S in sampling from 
a distribution of a single random variable. It is this case that was first presented 
by Wilks [1]. Suppose in the mass production of a certain type of screw one is 
interested in the least proportion of all screws manufactured that have lengths 
between the least and greatest lengths appearing in a random sample of 100 



TOLERANCE LIMITS 


583 



Fig. 1 . Graphs of Population Coverage for the Tolerance Level .90. 





















































































































584 


TL B. MURPHY 



Fig 2 Graphs of Population Coverage for the Tolerance Level 95 














































































580 


R. B. MURPHY 


screws. It is assumed that we do not know the distribution of the length, A", 
of a screw produced in this process. Furthermore, it is assumed, of course, that 
the manufacturing process is in a state of statistical control in the sense of 
Shewhart. We plan to discard two blocks: (— », xi) and (x 10 o, + 00 )—exactly 
as many blocks as observations. At the level a — .99 we obtain from Figure 3 
that at least 93.5% of all screws in the population sampled have lengths that fall 
between Xi and xm . If we now draw a random sample of 100 screws and find 
the least and greatest screw lengths to be 1.40 and 1.60 inches respectively, we 
may say that at least 93.5% of all screws from the population sampled have 
lengths between 1.40 and 1.60 inches at the .99 tolerance level. It must be 
observed that the prediction is made on the basis of preassigned order statistics, 
and not of the values 1.40 and 1.60. 

We might equally as well have put the question in another way: If we want 
at least 93.5% of the lengths of all screws to lie within the range of lengths of a 
sample of 100 screws, then at the tolerance level a = .99 what is the smallest 
sample we could have in which as many as 2% of the sample are not acceptable? 
Examining the intersections of the curves in Figure 3 with the line 0 = .935 we 
choose the smallest n such that m/n ^ .02 and find n = 100. 

5. The case of more than one variate. The ideas given in the introduction 
may be extended to sampling situations involving two or more statistically 
dependent variates with a continuous joint c.d.f. by means of the notion of blocks. 
The abstract formulation is given by Tukey [3]. We shall restrict ourselves to 
the case of two dependent variates X and F, but the generalization is obvious. 
Because of the dependence, the joint population of X and F may be expressed 
as an associated pair of values W = (A, F). Suppose a sample of size n is drawn 
from this population, and let the pairs be W \, w 2 , • • • , w n , where w% = (xi , y ,*). 
If we now choose a sequence of n numerically valued functions of x and y (or of w), 
fi(w), • • • ,/»( w), let us order the Wi in a sequence w\\ w 2 \ • • • , such that 
/i(w<+i) > /i(h\- 1) ). Imagine now that the sample values are plotted in a plane 
scatter diagram. We call the first block the set of points w = (x, y) such that 
fi(w) < Mw™). That is, we may imagine the curve /i(x, y) — /i(u>i (1> ) = 0 
plotted in the plane and that the first block is bounded by this curve. Then 
discarding w£ v we take the n — 1 remaining Wi and order them in a sequence 
Wi\ \ • • • , w ( nh such that/ 2 (iy|+i) > / 2 (w< 2) ). We call the second block the 
set of points w = (x, y) such that fi(w) > and also f 2 (w) < f 2 (wi 2) ). 

Thus the second block is bounded by the curves /i(x, y) — /i(wi (l) ) = 0 and 
/ 2 (x, y) — / 2 (ii>i <2) ) = 0. If we continue this process of discarding and reordering, 
until all n functions /»• are used, we shall obtain a division of the plane into 
n + 1 non-overlapping blocks, the “extra” block arising at the last step in the 
process. Then the fraction, u , of “points” ( X , F) of the joint population of 
X and F that are covered by any n — m + 1 blocks has the probability element 
(2.1). Also the probability a that the population coverage, u , will be at least as 
large as 0 is given by (2.2). The n — m + 1 blocks constitute a tolerance region . 



TOLERANCE LIMITS 


687 


An extension of this case has been made by Wald [2]. Namely, before a 
sample is taken let us choose a numerically valued function / of w and choose 
k(<tri) of the Wi and order them in a sequence • • • , such that 

i) > f(waj) and a i+ i > aj . Next, within each “strip” of the Or, y) plane 
such that w = (x, y) satisfies /(Wa°+i) > f(w) > fiwa/), suppose that we follow 
the construction in the previous paragraph. Then the population coverage, v, 
byn — m+1 blocks from one or more of these strips or their exteriors has the 
probability element (2.1). 

Again the warning must be made that the above functions /, /i, f 2 , • • • , / n , 
the numbers a \, a *, • • • , a* and the sequence of construction must be completely 
specified before samples are drawn to which this scheme is to be applied. 

6. Illustrations for two variates. As an example of the use of the graphs for a 
two-variate case, we use an example cited by Tippett [8]. The two variates are 
the percentage of pig iron, X , and the lime consumption, Y , per cwt. of steel in 
100 steel castings made without slag control. A scatter diagram is given in 
Figure 4. Unfortunately the value of this example is lessened by the fact that 
the block schemes were made after the sample had been taken; it does illustrate, 
at least, the two simple types of scheme. 

The tolerance region T (solid lines in Figure 4) resulted from the following 
scheme: let/i(w) = y,h(w) = f z (w) = / 4 (w) = f 6 (w) = f*(w) = —y. Now 
follow the Wald procedure choosing f(w) = y with k = 6, and ai = 1, 02 = 13, 
a 3 = 46, a 4 = 75, 05 = 90, a« = 96. Then in each strip y af+l > y > y aj let 
f t (w) = x. Considering only the blocks within the heavy line as the tolerance 
region, we have, by counting the discarded blocks, m = 16. 

In constructing the region V (broken lines in Figure 4) we also use Wald’s 
method, taking f(w) = y — 5x with k = 2 and a\ = 3, 02 = 96. In the exterior 
region with f(w) > f(wn) let all/* = y + 5x and similarly in the exterior region 
f[w) < Then in the strip f(wn) > f(w) > f(wi 0) ) (i.e., in the region in 

which 41 > y — 5x > -77) choose fi(w) = y,fa(w) » Mw) = / 4 (w) = —y, 
= fa(w) = f 7 (w) - y + 5x, and fs(w) = f 9 (w) = -y - 5x. Counting 
the blocks outside the heavily bordered region, we have m = 17. 

We obtain by interpolation 0 = .80 for T and 0 = .78 for T' at the a == .90 
level. 

7. Ties. A tie is a sample point which in a coordinate system defining a set 
of order statistics coincides in one or more coordinates with other sample points. 
For instance, in the X coordinate of our example (32,159) and (32,185) are tied, 
and (47, 218) and (47, 218) are tied in any system of coordinates. It would 
seem easier to avoid ties with regions of the type of V than with those of the 
type of T. 

The existence of ties in the population is assumed impossible, because positive 
point probabilities would destroy the continuity of the c.d.f. Therefore we 
attribute the ties to the crudity of measuring devices. 



588 


R. B. MURPHY 


A procedure for handling ties is given by Tukey [4], 

8. Acknowledgments. The author wishes gratefully to acknowledge the 
assistance of Dr. S. 8. Wilks in the preparation of this note and of Dr. J. W. 
Tukey, who also suggested the data used in section 6. 



20 30 40 50 

Fig. 4. Illustrative Tolerance Regions for Two Variates. 


REFERENCES 

[1] S. S. Wilks, “Statistical prediction with special reference to the problem of tolerance 
limits,” Annals of Math. Stat., Vol. 13 (1942), pp. 400-409. 

12] A. Wald, “An extension of Wilks’ method for setting tolerance limits,” Annals of Math. 
Stat., Vol. 14 (1943), pp. 45-65. 

[3] J. W. Tukey, “Non-parametric estimation II. Statistically equivalent blocks and 
tolerance regions—the continuous case,” Annals of Math. Stat., Vol. 18 (1947), 
pp. 529-539. 




FOURTH DEGREE EXPONENTIAL 


589 


[4] J. W. Tuket, “Non-pararaetric estimation III. Statistically equivalent blocks and 

multivariate tolerance regions—the discontinuous case,” Annals of Math. Stat ., 
Vol. 19 (1948), pp. 30-39. 

[5] K. Pearson, Tables of the Incomplete Beta-Function , Cambridge, 1934. 

[6] C. M. Thompson, “Tables of percentage points of the incomplete beta function,” Bio- 

metrika f Vol. 32, Part II (1941), pp. 151-181. 

[7] H. Goldberg and H. Levine, “Approximation formulas for the percentage points and 

normalization of t and x*”> Annals of Math. Stat. t Vol. 17 (1946), pp. 216-225. 

[8] L. H. C. Tippett, Statistical Methods in Industry , Iron and Steel Industrial Research 

Council, British Iron and Steel Federation, 1943. 


THE FOURTH DEGREE EXPONENTIAL DISTRIBUTION 
FUNCTION 1 

By Leo A. Aroian 
Hunter College 

We shall derive a recursion formula for the moments of the fourth degree 
exponential distribution function, state its more characteristic features, and show 
how the graduation of observed distributions may be accomplished by the method 
of moments and the method of maximum likelihood. The purpose of the note 
is to make possible a wider use of this function. 

R. A. Fisher [1] introduced the fourth degree exponential function 

(1) yt = k exp {- (W 4 + /W 3 + ft* 2 + AO}, 

where r\ < t < r 2 , t = (x — m)/a } m indicates the population mean, <r the 
population standard deviation, and where the 0’s are functions of 

* t n y t dt. 

i 

A. L. O’Toole in two stimulating papers [2], [3], has studied (1); however his 
methods and results are unnecessarily complicated. O’Toole requires eight 
moments to determine parameters similar to the 0’s. Both Fisher and O’Toole 
considered the restricted class of (1) with range (— °°, «). 

Let 

(2) v = l m exp {- (M* + W + /Sj< 2 )}, do = e" IM dt 
in 

(3) a n = f fyt dt, obtaining 

Jri 

(4) 404O!n+s + 3fta n +2 + 2Aa»+i + ■■ TUX*-1 9 71 ** 1, 2, 3, • • * , 

1 Presented to the American Mathematical Society and the Institute of Mathematical 
Statistics, September 4, 1947. 




590 


LEO A. AROIAN 


and for n = 0, the right side of (4) is defined as zero. The result (4) is valid under 
the assumption 

(5) Uv]rl = 0. 

Given the first six moments, ft, ft , ft , ft are readily determined. It will be 
found that if ft > 0, ft ^ 0, then n - — », r 2 = <*> ; while if ft < 0, and ft 5* 0, 
ri and r 2 will be finite. If we set n = 0,1, 2, 3, in (4), the solutions are 

ft = {a 3 (a 6 — 4a 3 ) — («4 — 3)(a 4 - 1)} -5- 4Z>; 

ft = {—a 3 (a« 3a 4 — 0 : 3 ) + (as — a 3 )(a 4 — 3)} -5- 3D; 

ft = {(«3 — a 6 )(a 5 — 4a 3 ) + (a 4 — l)(ae — a 3 — 3a 4 )| -r 2D; 

ft = {a 3 (a6 — a 3 a 5 — 3a 4 + 3a 3 ) — (a 4 — 3)(a 5 — a 3 a 4 )) -5- D, 

where 

D = (a« — a\ — a 3 )(ct 4 — a 3 •— 1) — (a 3 — « 3 — a 3 a 4 ) 2 ^ 0. 

To prove D ^ 0 we adopt the method of J. E. Wilkins Jr. [4]. In only a trivial 
case is D = 0. Let 


G(a, b, c, d) = f 2 (a + bt + ct 2 + d* 3 ) 2 */, d* ^ 0, 

Jr 1 

where y* is any probability function with range n ^ t ^ r 2 . Since D(a, 6, c, d) 
is a semi-definite quadratic form, its discriminant will be non-negative. But 
its discriminant is easily seen to be equal to D, thus 


(7) 


a 3 1 0 1 | 

a 4 a 3 1 0 

D ^ 0. 

a 3 a 4 olz 1 


a 3 a 3 a 4 a 3 

We summarize without proofs the essential features of the fourth degree 
exponential. Near the normal point, cu = 3, a 3 = 0, the fourth degree expo¬ 
nential function, the Pearson system, and the Gram-Charlier Type A are essen¬ 
tially alike. Type C [5] while similar is not the same. Note that ft may be 
negative and in such a case ri and r 2 are the two real zeros of the derivative of (1). 
The exponential may be bimodal as well as unimodal and the normal curve is 
the special case ft = ft = ft = 0. Various special cases where a particular $ 
is zero are readily handled by either (4) or (6). The graduation of both unimodal 
and bimodal observed distributions will be published elsewhere. 

Let 


y t = k exp - ^2 ft t\ 

j'-i 


( 8 ) 


ri < t < r 2 , 



FOURTH DEGREE EXPONENTIAL 


591 


where 


(9) j = f exp — 2 ft $ dt. 

/j J n 

The likelihood, L, in a sample of N is given by 

( 10 ) L = k K exp j— jft 2 t' + Pr- 112 tr 1 + + ft Xj <<jj 

where U = (x< — m)/<r. Then 


(ID 


( 12 ) 


1 9fc 
k 9ft 


9 log L _ N dk 
179ft " ¥ 9ft 


X t'i > and 


f * I s exp 

J r i 


- Xft7 


^: cxp l 



+ 


dr x 

aft 



If we assume either r x and r 2 constant, or exp | — 23 ft r 2 f and exp 
negligible, then (12) becomes 


S Ar , j }- 

- Zj ftn 7 j 


(13) 


k f { exp {— Yftjtfdt and = 0 implies 

j ri ( 3 w i j /v aft 

J t exp ^ Y1 ft t ^ dt ^ 

\r- 2 7 ' V “X = = a >' 

J exp|— Yftjtjdt 


3 — 1,2, 


where ay is the sample estimate of ay. For, if in Y tl/N we let j = 1, 2, we find 
by (13) that x = m, and a 2 = — x) 2 /-^- The solution of (13) provides esti¬ 

mates of ft , ft , ft , and ft, if we set r — 4. Naturally more time is required 
for the solution of (13) as compared with the method of moments, but the maxi¬ 
mum likelihood estimates are asymptotically efficient. The system (13) must 
be solved by successive approximations. To determine the moments solution 
all we do is to replace ay by ay in equations (6) . This affords a point of departure 
from which the maximum likelihood equations may be solved. The two methods 
are not the same. 

The fourth degree exponential is readily generalized to a fourth (or rth) degree 
multivariate function including the normal multivariate function as a spe¬ 
cial case. 


REFERENCES 

[1] R. A. Fisher, “The mathematical foundations of theoretical statistics,” Roy. Soc. Phil. 
Trans ., Vol. 222, Series A, (1922), pp. 355-56 particularly. 



592 


G. F. CRAMER 


[2] A. L. O’Toole, “On the system of curves for which the method of moments is the best 

method of fitting,” Annals of Math. Stat. f Vol. 4 (1933), pp. 1-29. 

[3] A. L. O’Toole, “A method of determining the constants in the bimodal fourth degree 

exponential function,” Annals of Math. Stat., Vol. 4 (1933), pp. 79-93. 

[4] J. Ernest Wilkins, Jr., “A note on skewness and kurtosis,” Annals of Math . Slat., 

Vol. 15 (1944), pp. 333-335. 

[5] C. V. L. Charlier, “A new form of the frequency function,” Lund universitet , Acta , 

N. F. Bd 24 (1928), Avd. 2, Art. no. 8, pp. 1-26. 

[6] M. G. Kendall, The Advanced Theory of Statistics , Griffin & Co., London, Vol. II, p. 43. 


AN APPROXIMATION TO THE BINOMIAL SUMMATION 

By G. F. Cramer 
Washington , D. C. 

We consider the binomial expansion (q + p) n , where q = 1 — p and n is a 
positive integer. For given values of n, p, r , and s, where np < r < s < n, 
we are often interested in the probability P(r < x < s) that the number of suc¬ 
cesses x will satisfy r < x < s. 

When n does not exceed 50, we can use tables of the Incomplete Beta Function, 
or other convenient and accurate tables. For “large” values of n, we can use 
normal tables. When p is “small”, we can use Poisson tables. However, it is 
often true that p is fairly small, and yet not small enough to give really accurate 
results when Poisson tables are employed in the usual way, while n is too large 
for use of the tables of the Incomplete Beta Function and yet too small for ac¬ 
curate use of normal tables. 

It frequently happens that an upper bound for P{r < x < s) would serve our 
purpose. We propose to show how to find this from Poisson tables with greater 
accuracy than could be obtained by using these tables in the ordinary way. 

We shall denote the general term of the binomial expansion by B < = (i)p'q n ~ i 
and the general term of the corresponding Poisson distribution with the same 
value of p by Pi = ( pn)'e~ pn /i \. We shall also consider a second Poisson dis¬ 
tribution whose general term is given by Pi = (p'n)'e~ p ' n /i\ 1 where p' =f= p 
will be determined later. 

We shall use the following notations: 

(1) U, = B,+i/Bi = (n - i)(p)/(i + 1)(1 - p); 

(2) Vi = P i+ i/P t - pn/(i + 1); 

(3) Vi = P' +1 /P' f = p'n/(i + 1); 

(4) Ui - Vi = p(np - i)/(i + 1)(1 - p). 

From (4) we obtain at once the following: 

Lemma I. Ui > Vi or Ui < Vi according as i < np or i > np. 

Thus, the size of the general term of the binomial expansion falls off more 
steeply to the right of i = np than does that of the general Poisson term. 



AN APPROXIMATION 


593 


We can use lemma I to obtain an upper bound to P(r < x < s) for any r > np. 
In fact, 

B r = B r Pr/Pr J 
B r+ 1 < B r P r+l /P r J 

Br +2 < Br+lPr+t/Pr+l < PfPr+ 2 /Pr 5 


P. < P r P,/P r . 

Adding these, we obtain 

(5) p(r < x <«) = i: b, < (Br/Pr) ip. = (b,/p,) (i P, - JpA 

The quantity in parentheses in (5) can be found by use of the cumulative Pois¬ 
son table provided, of course, it is within the range of that table, while the 
Br/Pr can be computed directly. 

In the work we have done so far, we have used a Poisson distribution which 
is less steep than the corresponding binomial distribution throughout the whole 
interval np < r < x < n. It seems reasonable to investigate the possibility of 
improving upon (5) by using a Poisson distribution having a different value p' 
in place of p, where p ' is chosen so that the new Poisson distribution is of the 
same steepness at a; = r as is the binomial distribution. We wish to have 
U r = Vr and U t < V[ for all r < i < n. The first of these conditions requires 
that (n — r)(p)/(r + 1)(1 — p) = p'n/(r + 1). Solving for p' we obtain 

(6) p' - (n - r)(p)/(n)( 1 - p). 

We are now ready to prove the following: 

Lemma II. If p' is defined by (6) and if U % , V ,, and V[ are defined by (1), (2), 
and (3) respectively , then U% < V t < V % , provided r > np and i > r. 

It is easy to see that U % /V[ = (n — i)(p)( 1 + i)/( 1 + i)(l — p)(np'), and 
this can be reduced to (n — i)/(n — r) by replacing p' by its value from (6). 
Then U t /V[ < 1 since i > r. Moreover, we have V[/V % = (p'n)(i + 1)/ 
(i + 1 )(pw) = p'/p = (n — r)/(n — np). But r > np and hence V[ < V % . 
This completes the proof of Lemma II. 

We are now in a position to obtain an inequality somewhat better than (5). 
The derivation of the new upper bound for P(r < x < s) goes just as before 
except that each P, is replaced by P[. We obtain the new inequality 

(7) P(r<x<s)< K'B r /P'r, 
where K f = 23 P% — 23 P% • 

t -»r t-i 

We can get a lower bound as well as a somewhat improved upper bound for 



594 


G. T. CRAMER 


P(r < x < 8) by calculating # r and B r + 1 directly and then applying (5) or (7) 
to find an upper bound M of P(r + 1 < x < s). This gives the inequality 

(8) B r *4" B r +1 < P(r < x < $) < B r 4" -fl/. 

This could, of course, be still further improved by calculating directly still more 
of the BiS and using a similar procedure, but one would not care to carry this 
very far. 

To illustrate the various approximations, we have worked out a numerical 
example the results of which appear below. For convenience in checking, we 
have used a value of n which is within the range of the tables of the Incomplete 
Beta Function, even though we would ordinarily use our method only for larger 
values of n. 

Example. 8 = n = 40; r = 10; p = 1/10; p' = 1/12. The tables of the 
Incomplete Beta Function give P(10 < x < 40) = .0050631. Using Poisson 
tables in the usual way, we get P(10, 4) — P(40, 4) = .008132, which is not 
particularly good. Using inequality (5) we obtain: #io/Pio = .6790 and 
P(10 < x < 40) < .6790(.008132) = .005522. Using (8) and calculating both 
Bio and #n, we take r — 11 in the inequality (5) and obtain #io = .0035934, 
#n = .0010889, P(ll, 4) - P(40, 4) - .002840, B u /Pn = .5657, and hence 
.004682 < P(10 < x < 40) < .003594 4- .001607 = .00520. Again using 
method (8), but calculating # i2 also and using r = 12 in inequality (5), we get 
.004974 < P(10 < x < 40) < .005099, which is quite good. We can obtain a 
still better result by using inequality (7) instead of (5). Then p ' — 1/12, 
np' = 10/3, Bto/P’io = 2.150 + , P(10, 10/3) - P(40, 10/3) = .002366, and 
P(10 < x < 40) < .005087. 



ABSTRACTS OF PAPERS 

Presented at the Madison Meeting of the Institute, September 7-10,1948 


1 . On Distribution-free Confidence Intervals (Preliminary Report). Wassily 
Hoeffding, University of North Carolina, Chapel Hill. 

Let 0(F) be a functional of a distribution function (d.f.) F(x) (where * is a real number 
or a vector), defined over a class 3) of d.f.’s; 0 » a random sample from a population with 
d.f. F(x); 0 ,, < 0 n two functions of 0*;and <*« - Pr{0 n < 0(F) < 0 »J. Conditions are studied 
under which, given a, 0 < a < 1 , we have either a n = aora B >oora»-» a, for all F(x) 
in 3), where 3) is defined independently of the functional form of F(x). Under fairly gen¬ 
eral conditions we can obtain by “studentization” confidence limits 0 », 0 « such that lim 

_ ~ n-*oo 

a n = a, and y = lim E\/n(0 n — 0 n ) exists; 7 is minimized by using a least variance estimate 
«—*00 

of 0(F). If there exists a function #c( 0 ) such that var T„ < k t (0)rr i if 0(F) - 0 , for all F 
in 3\ we can define confidence limits with a positive lower bound for a.. This applies to a 
number of population characteristics estimated by rank order statistics, such as the co¬ 
efficients p' and r (estimated by Spearman’s and Lindeberg-Kendall’s rank correlation 
coefficients, respectively). In certain cases (including p' and r), 0(F) admits a binomially 
distributed estimate; then exact confidence limits can easily be obtained. This research 
was done under an Office of Naval Research contract. 

2. On Certain Statistics for Samples of 3 from a Normal Population. Julius 
Lieblein, National Bureau of Standards, Washington. 

In analytical chemistry three determinations are frequently made. Sometimes the 
average of only the two closest results is reported, the remaining observation being rejected 
as anomalous. In preparing a critique of this procedure, Dr. W. J. Youden encountered 
a need for information on certain properties of the distributions of the statistics 
(x' — x")/(xi — Xi), (x' -j- x")/ 2 , and (x' — x")/ 2 , where x' and x " (x' > x") are the two 
closest of the three determinations. This paper shows how these statistics differ from the 
ones heretofore treated involving “fixed” order statistics; gives the distribution of these 
statistics in random samples of 3 from a normal universe; and lists values of certain of the 
moments of their distributions. 

3. On Multinomial Distributions with Limited Freedom: A Stochastic Genesis 
of Pareto’s and Pearson’s Curves. Maria Castellain, University of 
Kansas City. 

The purpose of this paper is to investigate the most probable configuration of N random 
elements to be distributed in K(K < N) class intervals, where known forces are acting. 
We shall call these intervals of energy, using the terminology of statistical mechanics. 

We will prove that the most probable configuration is a configuration of statistical equi¬ 
librium since its probability of occurring converges to 1 as N becomes infinitely large. 

The main purpose of this paper is to discover which forces of attraction, operating in 
the intervals of energy, give Pareto’s and Pearson’s curves when statistical equilibrium 
is reached. 

We will consider a random variable Y(t), t being an independent variable, obeying a 
multinomial distribution law with limited freedom, and we will exploit the familiar process 
of statistical mechanics. The equation of the frequency curves corresponding to the equi¬ 
librium stage of the statistical experiment will be shown. 



596 


ABSTRACTS OF PAPERS 


4. Fitting Generalized Truncated Normal Distributions. Harold Hotelling, 
University of North Carolina, Chapel Hill. 

In a sample from a ^-dimensional normal distribution only those individuals are supposed 
to be observed which fall in a specified but arbitrary set A of positive measure. For esti¬ 
mating the parameters the method of moments is proved equivalent to that of maximum 
likelihood and therefore efficient. The problem is thus reduced to that of expressing the 
parameters of the normal distribution in terms of the moments of the truncated distribu¬ 
tion. This however is not generally possible in simple explicit form. Methods are pre¬ 
sented for dealing numerically with several special cases, including those in which A is a 
linear interval or a parallelogram. 


5. On the Distribution of the Two Closest Observations Among a Set of Three 
Independent Observations. G. R. Seth, Iowa State College. 

Let X\ , xt , Xz (Xi <Xi< Xi) be three independent ordered observations from a population 
having a probability density function f(x) . Let x', x" ( x ' < x”) be the two closest, then the 
probability density function of x', x" is given by 

6 • f(x f ) • /(*")[ 1 + F(2x" - x f ) - F(2x' - x")] 

where 

Fix) - £ f(x) dx. 

In the case f(x) is a normal distribution with unit variance, the joint distribution 
x" — x' 

oft/** x" — x' and z « -is obtained as 

Xz - Xi 

2 "V / 3y* f_ y»(l - s + z*) 1 • 

P L 3** j- 

This problem is of interest in cases where the conclusions are to be based on a set of 
three observations and one of the observations is to be rejected in the analysis of the data. 


6. The Derivation of Certain Recurrence Formulae and their Application to the 
Extension of Existing Published Incomplete Beta Function Tables. T. A. 
Bancroft, Alabama Polytechnic Institute, Auburn (presented by title). 

The objects of the paper are: ( 1 ) to give a number of new recurrence formulae in the in¬ 
complete beta function derived by a new method, and ( 2 ) to indicate how these new formulae 
have been used to obtain new tables of the incomplete beta function that are outside the 
range of the p and q values given in the existing published tables. 

The recurrence formulae have been derived by considering the incomplete beta function 
as a special case of the hypergeometric series, thus 

Bsip, q) - - F{p, 1 - g, p + 1, x), 

P 


where the usual form of the hypergeometric series is 

b(b + 1 ) s* 


a • b x a(a + 1 ) 
Fia, b, e, x) - 1 + — + 


c(c + 1 ) 2 ! 

, o(o 4- l)(o + 2) 


c(c + l)(c + 2 ) 


6(6 + 1)(6 + 2 ) af 
31 



ABSTRACTS OP PAPERS 


697 


This series converges for | x | < 1 , and x — 1 , if and only if a 4- b < c. Certain recurrence 
formulae for F(a t 6 , c, *) are then directly converted for use with B x (p t q) t or in the so-called 
normalized form J*(p, q ), provided c ** a 4- 1 . All conditions have been satisfied by setting 
a =* p, 5 « 1 — g, c«p-f-l, and q > 0 . 

For example, using the above mentioned methods we may obtain, among many others, the 
recurrence formulae: 

(i) xl x (p, q) - I x (p 4- l,g) + (1 - x)I x (p 4-1,5 — 1) «0, 

(ii) (p 4- q - px)I x (p, q) - ql,(p, q + 1) - p(l - x)I x (p 4- 1, q - 1) - 0, 

(iii) ql x (p, ? 4-1) 4- p/*(p 4- 1, g) - (P + q)I*(p t q ) - 0 . 

Formula (i) is essentially the basic recurrence formula used to obtain Karl Pearson’s 
1 ables. An indication of formula (iii) in another form was given by the author in the paper 
“On Biases in Estimation Due to the Use of Preliminary Tests of Significance,” Annals of 
Math. Stat. f Vol. 15 (1944), p. 194, and a direct proof was later given by the author in “Note 
on an Identity in the Incomplete Beta Function,” Annals of Math. Stat. Vol. 16 (1945), pp. 
98-99. All of the material in the present paper, however, is new, including recurrence form¬ 
ulae and tables and the mathematical method of derivation. 

7. Asymptotic Studentization in Testing of Hypotheses. Herman Chernopp, 
Cowles Commission for Research in Economics. 

If H is a hypothesis for which t < Ci(0) would be a good test if the value of the nuisance 
parameter 0 were known and 6 is an estimate of 0, then the following method of asymptotic 
studentization (obtaining critical regions of almost constant size) was suggested by Wald. 
Consider * < <p(§) where*>(0) » Ci(0) -f ••• 4- c,(0) and Pr[t < Ci(0)} = a, Pr[t — ci(d) < 
c 2 (0)| — cr, • • • Pr{t — Ci(d) — ... — CriP) < c r +i(0)} = a. It is shown that under reason¬ 
able conditions this test, and various modifications, designed for those cases where the c r (0) 
are difficult to obtain exactly have the asymptotic property that Pr\t < <?{$)) = 
a 4- 0{N~ atl ) where N is the size of the sample involved or an analogous variable. This 
property can be extended to the case where 0 is a ^-dimensional variable. 

8. Completeness, Similar Regions, and Unbiased Estimation. (Preliminary 
Report.) Erich L. Lehmann and Henry Scheffe, University of California 
at Los Angeles. 

A family 9W of measures M on a space X of points x is defined to be complete 

if J f(x) dM «= 0 for every M in implies f{x) — 0 except on a set A for which M(A) * 0 for 

every M in 9W. For a given family of measures the question of completeness may be re¬ 
garded as the question of unicity of a related functional transform. Classical unicity re¬ 
sults are applicable to many families of probability distributions that have been studied by 
statisticians. The notion of completeness throws light on the problem of similar regions 
and the problem of unbiased estimation. The concept of a maximal sufficient statistic— 
roughly, a sufficient statistic that is a function of all other sufficient statistics—is developed. 
A constructive method of finding such is given, which seems to apply to all examples or¬ 
dinarily considered in statistical theory. A relation between completeness and maximality 
is found. 

9. On a Proposed Method for Estimating Populations. Cecil C. Craig, Uni¬ 
versity of Michigan, Ann Arbor. 

It was proposed to the author by a biologist that a method be devised for estimating the 
total population in an area which shall utilize the minimum distances between randomly 



598 


ABSTRACTS OF PAPERS 


chosen individuals and their neighbors in directions lying in each of the four quadrants. 
Assuming that the area is a square and that the distribution law over it is rectangular, it 
turns out that the complete distribution of the lengths of sides of minimum squares which 
contain a second individual is simpler than that of minimum distances. In both cases a 
simple estimate is found which uses most but not all of the information in a sample and 
whose efficiency is comparable to that based on a complete enumeration of a sample area, 
though such an enumeration is not always possible. 

10. Some Results on the Asymptotic Distribution of Maximum- and Quasi- 
Maximum-likelihood Estimates. Herman Rubin, Institute for Advanced 
Study. 

The author investigates the asymptotic normality of maximum- and quasi-maximum- 
likelihood estimates of parameters of systems of linear stochastic difference equations. 
The principal tool is the extension of the Central Limit Theorem to dependent variables pre¬ 
viously obtained by the author (presented to the American Mathematical Society in April, 
1948). The results obtained are analogous to those in the case in which no differences are 
present. Some extensions are also made to systems of stochastic difference equations linear 
in the coefficients but not necessarily in the variables. If the complete system of stochastic 
difference equations is linear in the jointly dependent variables, asymptotic efficiency is 
demonstrated for maximum-likelihood estimates. 


11 . The Probability Points of the Distribution of the Median in Random Samples 
from Any Continuous Population. Churchill Eisenhart, Lola S. Deming, 
and Celia S. Martin, National Bureau of Standards, Washington. 

The abscissa of the (one-tail) «-probability point of the distribution of the median in 
random samples of size n «= 2m + 1 (m > 0) from any continuous population is identical 
with the abscissa of the corresponding P fi n-probability point of the parent distribution, 
where P t>n is determined by 


( 1 ) 


s c;p; n (i - p in ) n k - 

-K«+i> ’ * 


(0 < < < 1 ). 


From (1) it follows that 

(2) * 1 - P,.n 

and that 


(3) 


P'.n =* x A n + 1 , n 4- 1) 


1 + F,(n+1, n+l) 1 + e «z.c+i.»+i>> 


where x,(vi , ^)> F(vi , *> 2 ), and Z,(j/i , V2) denote the €-probability points of the incomplcte- 
beta-function distribution, Snedecor’s F-distribution and Fisher's z-distribution, for 
yi(*» 2 q) and i»*(» 2 p) * degrees of freedom’, respectively. The foregoing results are cer¬ 
tainly not “new": Harry S. Pollard implicitly utilized the first equality on the extreme left 
of (3) in his doctoral dissertation at the University of Wisconsin in 1933 (see Annals of 
Math. Stat., Vol. 5 (1934), p. 250), and John H. Curtiss has given the generalization of (1)‘ 
appropriate to the case of the ‘rth. position’ in random samples from any continuous popu¬ 
lation (see Amer. Math . Monthly , Vol. 50 (1943), p. 103) and utilized (3) explicitly to obtain 
the 5% point of the distribution of the median in random samples of size n — 23. The aim 
of the present paper is to give these results somewhat greater publicity—they are hardly 
“well known". To this end a table (Table 1) is given of the values of P,, n to 5 significant 
figures for « - 0.001,0.005,0.01, 0.025, 0.05,0.10, 0.20, 0.25 and n * 3(2)15(10)95, together 



ABSTRACTS OF PAPERS 


599 


with expressions from which P t , n can be evaluated accurately and conveniently for values 
of n (and «) not included in the table. Numerical examples illustrate the use of the tabic 
and formulas. Concise derivations of the fundamental relations and formulas arc given 
in an appendix. 


12 . On the Arithmetic Mean and the Median in Small Samples from the Normal 
and Certain Non-Normal Populations. Churchill Eisenhart, Lola S. 
Deming, and Celia S. Martin, National Bureau of Standards, Washington. 


Lot. £ t , n and £ t .n denote the abscissae of the one-tail ^-probability points of the arith¬ 
metic mean and the median, more specifically, the abscissae exceeded with probability « 
by the mean and the median, respectively, in random samples of size » (■ 2 « + 1) from 
any specified population, and let ox n and denote the standard deviations of the mean and 
the median in such samples, respectively. The following symmetrical populations with 
zero location parameters and unit scale parameters are considered in this paper: 

Type 

e-‘ lS 

V2i ' 


normal (Gaussian) 

— oo <, x ^ oo 

double-exponential (Laplace) 

ie-W, 

— oo <> x <> oo 

rectangular (uniform) 

1, 

£ x £ \ 

Cauchy 

1 1 
t1 +rs 2 ’ 

— oo <, x 00 

sech 

-sech x, 

TT 

— 00 ^ X 5a OO 

sech 2 (derivative of “logistic”) 

1 sech 2 x, 

— 00 <; X ^ OO 


Using the basic table, relating probability points of the distribution of the median to prob¬ 
ability points of the parent distribution, given in Churchill Eisenhart, Lola S. Deming and 
Celia S. Martin, “Tho probability points of the distribution of the median in random sam¬ 
ples from any continuous population,” values of x,, n for random samples from each of the 
above distributions have been evaluated, and are tabulated to 5 decimal places in the pres¬ 
ent paper, for n * 3(2)15(10)95 ande - 0.001,0.005,0.01,0.025,0.05,0.10,0.20,0.25. 

In the case of the normal distribution, values of £,, n to 5 decimal places arc given also for 
the aforementioned combinations of c and n. Comparison of the values of £ f , n and £ t , n 
gives precise numerical meaning to the well-known lesser accuracy of the median as an 
estimator of the center of a normal population, for samples of any odd size (n » 2m -f 1). 
Values of the ratio R t , n =* £„„/%,» are given also for this case (normal population), to 4 
decimal places for the above combinations of c and n, together with the best available values 
of <rx n /<ri n for n = 3(2)15(10)55. WhenO < « £ 0.025, the ratio R t , n exceeds the ratio , 

showing that the Hails’ of the exact distribution of the median are ‘longer’ than the tails of 
the normal distribution with the same mean and standard deviation; and, when 0.05 ^ ^ 

0.25, the ratio R t , n is less than <r»J<ri n . (A theoretical argument shows that the point 
of equality is close to the 0.042-probability point.) A method for computing <rx n , based on 
the foregoing, is given that is believed to be accurate to .001/\/n, or better for n > 3. 

In the case of the double-exponential distribution, values of £ t% » arc given to 4 decimal 
places for n « 3(2)11, and « ** 0.005,0.01,0.025,0.05,0.10,0.25, for comparison with the cor¬ 
responding values of x„ n . It is found that when n ■■ 3, £.,» < x„* for $ - 0.005, 0.001, 
and 0.025, indicating that in random samples of 3 from a double-exponential distribution 
the arithmetic mean furnishes narrower confidence limits for the center of the distribution 



600 


ABSTRACTS OP PAPERS 


at 0.96,0.98, and 0.99 levels of confidence. When n — 6, the mean is ‘better* at the .98 and 
.99 levels of confidence; and, when n * 7, at the 0.99 level. For all other combinations of 
« and n 3), the median is ‘better.* 

In the case of the rectangular distribution, values of £«. n are tabulated to 4 decimals for 
n « 3(2)9, and values of x t , n , the ^-probability point of the mid-range in samples of n, 
forn « 3(2)16(10)95, in each instance fore » 0.005,0.01,0.025,0.05,0.10,0.25, and in the case 
of £,.n for t — 0.001 also. The superiority of the midrange over the mean and the median, 
well-known but here exhibited numerically for the first time, is truly amazing. 

It is planned to provide values of i ,. n for samples from the sech and sech* distributions in 
the final paper. 


13. The Relative Frequencies with which Certain Estimators of the Standard 
Deviation of a Normal Population Tend to Underestimate its Value. 

Churchill Eisenhart and Celia S. Martin, National Bureau of Standards, 
Washington. 


Let a?i , Xf , • • • , x n denote a random sample of n independent observations from a normal 
population with mean u and standard deviation <r. Common estimators of <r are 


Si 


|/ s (*. - f)y», «« = 

1,1 = 4/I.-1 |a: ‘ ~ f|/n ’ 


8i\/n/(n— 1), Si — 8i/c it 


m 2 = nii\/n/(n — 1), 


and Ri = ( x L — x 3 )/d lt where x =» 2 x t /n, x L is the largest and x 3 the smallest of the 

x'b, c t ** E(fii), and d t = E(x L — x 3 ), the symbol E( ) denoting “mathematical expectation 
(or mean value) of.’* A table is given that shows to 3 decimals the relative frequencies 
(probabilities) with which these estimators tend to underestimate £ when n =» 2(1)10, 12, 
15, 20, 24, 30, 40, 60. The results show among other things that, for very small samples 
(n £ 10) such as chemists and physicists commonly use, Bessel’s formula for the probable 
error, which is based on s 2 , has a marked downward bias in the probability sense (in addi¬ 
tion to its known slight downward bias in the mean value sense), whereas Peter*s formula, 
which is based on m* , has only a slight downward bias in the probability sense and no bias 
in the mean value sense. A table of divisors is given by means of which “median estima- 

» » 

tors** of <r can be computed readily from the basic quantities S (x t — I), 2 | x t — 1 |, and 

i-i »—i 

(xl — xa) y that is, estimators that will over- and underestimate £ equally often in repeated 
use. An application to control charts is noted. Median estimators, like maximum likeli¬ 
hood estimators (“modal estimators’*) have the useful property that if 7*i is a median esti¬ 
mator of $, then/(7*f) is a median estimator of /(0), a property unfortunately not possessed 
by the customary “unbiased” (“mean”) estimators. 


14. Some Non-Parametric Tests of Whether the Largest Observations of a Set 
are too Large. (Preliminary Report.) John E. Walsh, Douglas Aircraft 
Company, Santa Monica, California. 

Let a?(l), • • • , x(n) represent the values of n observations arranged in increasing order of 
magnitude. By hypothesis these observations have the properties: (1) They are independ¬ 
ent and from continuous symmetrical populations (2) For large n the variances of the tail 
order statistics are either very large or very small compared with the variances of the cen¬ 
tral order statistics (3) For large n the tail order statistics are approximately independent 



ABSTRACTS OF PAPERS 


601 * 


of the central order statistics (4) Each observation is from a population whose median is 
either 9 or <p> where x(n — r + 1), • • • , x(n) are from populations with median 9 while the 
central and smaller order statistics are from populations with median ip. The test is: 
Accept < 9 if min [x(n — i k ) + x{j k ) ;l<fc<s<r]> 2s(*«), where i u < i«+i , jv < jv+i , 
i, ~ r — 1, and t a is defined by Pr [z(*«) < <p \ 9 » <p) « a. Here 

a - Pr {min [x(n - i k ) + x(j k ); l < k < * <> r] > 2* \ 9 - v). 

For large n the significance level of the test is approximately a while the significance level 
does not exceed 2 a for any value of n. Suitable values of a can be obtained for r ^ 4. As 
9 _ ^ —► — oo the power function tends to zero, while the power function tends to unity as 
9 — ip —> oo. For 9 — ip < 0 the power function is monotonically increasing. 


15. On the Bounded Significance Level Properties of the Equal-tail Sign Test 
for the Mean. John E. Walsh, Douglas Aircraft Company, Santa Monica, 
California, (Presented by Title). 


The equal-tail sign test for deciding whether the population mean n is equal to a given 
hypothetical value no is defined by: Accept n^ no if either x> < no or x n +i-i > no , 




Here Xj , (j = 1, • • • , »), is the j th largest of n independent observations drawn from n 
populations which satisfy the conditions: (i) The mean of each population has the value n* 
(ii) Each population is continuous at its mean, (iii) The mean is at a 50% point for each 
population. This paper investigates how the significance level of the equal-tail sign test 
varies when (i)-(iii) are not satisfied. It is found that the significance level does not differ 
noticeably from its hypothetical value under conditions much more general than (i)-(iii). 
This significance level stability, combined with the properties of being easily applied and 
reasonably efficient for small samples from a normal population, suggests that the equal- 
tail sign test be considered for application whenever the population mean is to be tested on 
the basis of a small number of observations. 


16. Infinitely Divisible Distributions. William Feller, Cornell University, 
Ithaca, New York. 

A simple derivation of P. Levy’s formula is given starting from the following definition: 
a distribution function F{x) is infinitely divisible if for every n it is possible to find finitely 
many distributions F k , n (x) such that F(x) « F\. „(&)* • • • * F^.nix) and that F kin (x) tends 
to the unitary distribution uniformly in n. This definition is more general than the one 
used by P. L 6 vy and Khintchine. The equivalence of the two definitions was proved by 
Khintchine by deep methods. The new approach renders the equivalence obvious. Fur¬ 
thermore, a new characterization of infinitely divisible distributions is given; it is equiva¬ 
lent to Gnedenko’s characterization but requires no special analytical tools. 

17. Fluctuation Theory of Recurrent Events. William Feller, Cornell Uni¬ 
versity, Ithaca, New York. 

Consider a sequence of independent or dependent trials but suppose that each has a dis¬ 
crete sample space. Tl^ paper studies recurrent patterns & which can be roughly charac¬ 
terized by the property that after every occurrence of & the process starts from scratch, 
the conditional probabilities coinciding with the original absolute probabilities. Typical 
examples are success runs, returns to equilibrium, zeros of sums of independent variables, 
passages through a state in a Markov chain. New methods are developed unifying and 
simplifying previous theories and applying to larger classes of recurrent events. It is shown 



ABSTRACTS OP PAPERS 


in an elementary way the probability that & occurs at the n-th trial either has a limit or is 
asymptotically periodic. This theorem has many consequences. For example, the ergodic 
properties of discrete Markov chains follow in a few lines, and the difference between finite 
and infinite chains disappears. Several theorems of the renewal type are proved. Weak 
and strong limit theorems for the number N n of occurrences of & in n trials are derived 
shedding new light on stable distributions. 

18. Formulas for the Percentage Points of the Distributions of the Arithmetic 
Mean in Random Samples from Certain Symmetrical Universes. Uttam 
Chand, University of North Carolina and National Bureau of Standards, 

Using the method of Fisher and Cornish, the 100«% point of the distribution of the arith¬ 
metic mean in random samples of size N from any universe having finite cumulants of the 
first four orders, *1 , k* , «c«, «4 , is expressed to order 1/AT* as a function of N, the 100«% point 
of a standardized normal deviate and the quantities *a , k 2 , k*/k\ . The numerical 

coefficients are evaluated for the cases of sampling from rectangular, double-exponential, 
sech and sech* distributions. The application of the resulting formulas is illustrated nu¬ 
merically for « =* .001, .005, .010, .025, .050, .100, and .250. In the case of the rectangular 
and double-exponential distributions, the results obtained for N = 10 are compared with 
accurate values, indicating the accuracy of the formulas. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Professor T. A. Bickerstaff has been appointed Chairman of the Department 
of Mathematics at the University of Mississippi. 

Professor Raj Chandra Bose has resigned as head of the graduate Department 
of Statistics of the University of Calcutta, and has been appointed Professor of 
Mathematical Statistics at the University of North Carolina beginning in the 
winter of 1949. Professor Bose is an authority on the design of experiments 
and is writing a book on the combinatorial mathematics of the subject. He has 
also published extensive contributions to differential geometry and to multi¬ 
variate statistical analysis, and has been instrumental in developing practical 
sample surveys. He served as Visiting Professor in the Institute of Statistics 
at North Carolina in the winter and spring of 1948. 

Mr. Hamilton Brooks's paper, “The Probable Breakdown Voltage of Paper 
Dielectric Capacitors," was one of the four papers selected for a national award 
by the American Institute of Electrical Engineers. His paper presents the sta¬ 
tistical treatment of an engineering problem and shows by experiment how 
insulation strength distribution is determined by the distribution of the extreme 
size of flaws. 

Dr. C. West Churchman, formerly a member of the staff at the University of 
Pennsylvania, was appointed Associate Professor of Philosophy at Wayne Uni¬ 
versity, Detroit 1, Michigan, starting February 1, 1948. 

Dr. William G. Cochran has accepted an appointment as Professor of Bio¬ 
statistics in the School of Hygiene and Public Health of the Johns Hopkins Uni¬ 
versity and will assume this post in September. Dr. Cochran, a native of 
Glasgow, Scotland, comes to Johns Hopkins from the University of North Caro¬ 
lina where he served as Associate Director of the Institute of Statistics from 1946 
until the present. 

Dr. Louis M. Court has been promoted to an assistant professorship in the 
Mathematics Department of Rutgers University. 

Dr. Donald A. Darling, formerly a member of the staff at Cornell University, 
has accepted an assistant professorship at Rutgers University. 

Mr. Aryeh Dvoretzky has been appointed a member of the Institute for Ad¬ 
vanced Study, Princeton, New Jersey, for the 1948-1949 academic year. 

Mr. Arnold King, formerly Director of Research in Statistical Methodology 
for the Bureau of Agricultural Economics at Iowa State College, was appointed 
Managing Director of National Analysts, Inc., Philadelphia on July 1, 1948. 

Mr. Charles L. Marks has resigned his position as instructor of mathematics 
at the University of North Carolina to accept a teaching appointment in the 
Department of Statistics, The George Washington University, Washington 6, 
D.C. 


603 



604 


NEWS AND NOTICES 


Miss Doris Newman has accepted an appointment at the U. S. Naval Medical 
Research Laboratory, U. S. Naval Submarine Base, New London, Conn. 

Dr. Ernest Rubin has been transferred from the Immigration and Naturaliza¬ 
tion Service, General Research Section, Washington, D. C. to the European 
Branch, Areas Division, Office of International Trade in the Department of 
Commerce as an Economic Statistician. 

Mr. David Rubinstein has been promoted from Junior Research Assistant in 
the Statistical Laboratory, University of California, Berkeley, to a Teaching 
Assistant. 

Miss Elizabeth L. Scott, formerly an Associate and Research Assistant in the 
Statistical Laboratory, University of California, Berkeley, has been promoted to 
Lecturer and Research Assistant. 

Dr. Gobind R. Seth, who was formerly a student at Columbia University, has 
accepted an associate professorship in statistics at the Statistical Laboratory, 
Iowa State College. 

Dr. Charles M. Stein has been promoted to an assistant professorship in the 
Statistical Laboratory, University of California, Berkeley. 

Professor Gerhard Tintner is on leave of absence for one year from the Iowa 
State College to join the Department of Applied Economics at Cambridge Uni¬ 
versity, Cambridge, England as a Research Associate. 

Mr. L. H. C. Tippett, Chief Statistician of the British Cotton Industry Re¬ 
search Association, delivered twelve one-hour lectures on Statistical Quality Con¬ 
trol and Industrial Experimentation at a conference at the Massachusetts Institute 
of Technology, May 5-14, before a large audience. Dr. W. A. Shewhart of the 
Bell Telephone Laboratories addressed a large audience on the Future of Statistics 
in Industrial Research and Quality Control on May 14 at the same conference. 


Scientists and Reserve Officers 

The Department of the Army has established a program of particular interest 
to statisticians and other scientists who hold Reserve commissions in the Army, 
and who are professionally engaged in teaching or research and development. 

The objectives of the program are to: 

(1) maintain the useful affiliation of statisticians and other scientists with the 
Organized Reserve Corps, 

(2) provide peacetime Reserve assignments for these officers, enabling op¬ 
timum utilization of their education, experience and skills, 

(3) furnish mobilization assignments which will fully utilize their talents, and 

(4) adequately prepare these officers for mobilization. 

The Technical Services of the Department of the Army submit to these Re¬ 
search and Development Reserve Groups research problems and projects which 
pose an intellectual challenge to members of the group. Thus, the program 
provides members of each group a type of training which is in keeping with their 
scientific and technical interests and competence, rather than a traditional 
kind of training session in which scientists have little or no interest. 



NEWS AND NOTICES 


605 


The program is now being implemented only in those areas where there is a 
definite local interest. To date, eighteen Research and Development Reserve 
groups have been organized. Twelve additional groups are in process of organi¬ 
zation. Others are in the initial stages of formation. Several of these groups 
have been formed in communities in which large universities, industrial research 
laboratories, or private research foundations are located. Typical localities are 
Chicago, Illinois; Wilmington, Delaware; Newark, New Jersey; Houston, Texas; 
Washington, D. C.; Manhattan and Lawrence, Kansas; Champaign-Urbana, 
Illinois; Pittsburgh, Pennsylvania; Denver, Colorado; and Detroit, Michigan. 

Provision is made to submit research projects of interest to all categories of 
scientists—chemists, physicists, engineers, geologists, geographers, psychologists, 
mathematicians, statisticians and all of the biological scientists. 

Reserve officers who are currently engaged in civilian research, college or 
university teaching, or industrial research or development, or who in the past 
have had specific research experience are eligible to make application for assign¬ 
ment to an Organized Reserve Research and Development Group. A group 
may be organized in any locality where there are twenty (20) or more qualified 
officer scientists who desire to participate in the program. A subgroup may be 
organized with ten (10) qualified members. 

The program is under the general direction of the Research and Development 
Group, Logistics Division, General Staff, United States Army. The entire 
program is outlined in Department of the Army Circular Number 127, dated 5 
May 1948. 

Inquiry about organization of an Organized Reserve Research and Develop¬ 
ment Group or about assignment to a group already organized should be made 
of the Unit Instructor, ORC, or of the Senior Army Instructor, ORC, in the 
locality in which the officer resides. In localities in which a group has already 
been organized, the Commanding Officer of the group will consider applications 
for assignment of additional officers. 


New Members 

The following persons have been elected to membership in the Institute 
(June 1 to August 15, 1948) 

Anderson, Hjalmar, Jr. (Univ. of Oregon Medical School) Student, Turner , Oregon. 

Banerjee, Kali Shankar, M.A. (Calcutta Univ.) Statistician, Central Sugar Cane Research 
Station, P.O. Pusa, Bihar, India. 

Bordelow, Derrill Joseph, B.S. (Louisiana State Univ.) Associate Physicist with Naval 
Ordnance Laboratory, 602 A. Street S.E. f Washington , D. C. 

Cowan, David, B.S. (Tufts Univ.) Research Analyst, War Department, 89 Lewis Street , 
East Lynn , Massachusetts . 

Frederiksen, Norman, Ph.D. (Syracuse Univ.) Research Associate, Educational Testing 
Service; Associate Professor of Psychology, Princeton University, Educational Test¬ 
ing Service, Box 592, Princeton, N. J. 

Gehman, Harry M., Ph.D. (Univ. of Pennsylvania) Professor of Mathematics, University 
of Buffalo, 168 Winspear Avenue , Buffalo 15 , New York. 



606 


NEWS AND NOTICES 


Hofmann* John E., A.M. (Univ. of Minnesota) Senior Research Fellow, 8222 Oakland 
Street, Ames, Iowa. 

Kimball, Allyn W., Jr., B.S. (Univ. of Buffalo) Research Statistician, Department of 
Biometrics, School of Aviation Medicine, Randolph Field, Texas. 

King, Edgar P., Jr., B.S. (Carnegie Institute of Technology) Teaching Assistant in Mathe¬ 
matics, Department of Mathematics, Carnegie Institute of Technology, Pittsburgh 
13, Pennsylvania. 

Link, Curtis K., B.S. (Univ. of Oregon) Graduate Student-Assistant, 750 W. 6th Street, 
Eugene , Oregon. 

Leider, Nathan, B.A. (College of the City of N. Y.) Mathematician P-2, 1841 Summit 
Place, N.W ., Washington 9, D. C. 

Man os, Nicholas E., M.A. (Univ. of Calif.) Meteorologist and Statistician, 1424 Rhode 
Island Avenue, N.W ., Washington 5, D. C. 

Peters, Stefan, Ph.D. (Erlanjen, Germany) Lecturer at the University of California, 1207 
Peralta Avenue , Berkeley 6, California. 

Petrou, Nicholas V., M.Sc. (Harvard Univ.) Electrical Engineer, Project Engineer, West- 
inghouse Electric Corporation, 1844 Ardmore Blvd., Pittsburgh 21, Pennsylvania. 

Prakash, Aditya, M.A. (Univ. of Michigan) Student, c/o Mathematics Department, Uni¬ 
versity of Michigan, Ann Arbor, Michigan. 

Read, Robert R., B.S. (Oregon State College) Apprentice Engineer, Inventory and Costs 
Division, Pacific Telephone and Telegraph Company, 8207 N.E., SO, Portland, Oregon. 

Seiden, Esther, M.A. (Vilno, Poland) Research Assistant, Statistical Laboratory, Univer¬ 
sity of California, 2116 Derby Street, Berkeley 5, California. 

Sodano, John J., B.S. (Queens College) Student, Mathematical Statistics, Columbia Uni¬ 
versity, 172-15 9Srd Avenue, Jamaica 8, New York. 

Stillinger, Richard C., M.S., (Univ. of Michigan) Graduate Student, 1868 Weston Court, 
Willow Run, Michigan. 

Swan, Albert W., B.A.Sc. (Univ. of Toronto) Statistical Section Research and Develop¬ 
ment Department, The United Steel Company Limited, c/o The United Steel Com¬ 
panies Ltd., 17 Westbourne Road, Sheffield 10, England. 

Tate, Robert F., A.B. (Univ. of Calif.) Teaching Fellow, Department of Mathematical 
Statistics, Phillips Hall, Chapel Hill , North Carolina. 

Teichroew, Dan, B A. (Univ. of Toronto) Division of Research, Department of Lands and 
Forest, South Baymouth, Ontario, Canada. 

Tyler, Leona £., Ph.D. (Univ. of Minnesota) Associate Professor of Psychology, Depart¬ 
ment of Psychology, University of Oregon, Eugene, Oregon. 



ADOPTION OF THE NEW CONSTITUTION 

The chief order of business at the business meeting of the Institute held at 
Madison, Wisconsin on September 10,1948, was the adoption of the new Consti¬ 
tution. The draft mailed to the members in August, 1948, was adopted unani¬ 
mously after two changes had been made. They were: (1) the insertion of the 
word “Article” before each of the respective articles and (2) the elimination of 
the first “the” in the third line and fourth paragraph of Article 4. 

Other business transacted at the meeting included a report of the Secretary- 
Treasurer on the financial condition of the Institute indicating that while the 
Institute is just operating within its income during 1948, steps will have to be 
taken to provide the additional revenue needed for 1949. It was decided not to 
raise dues for 1949 but to attempt to raise additional funds by: (1) an immediate 
appeal to universities and other institutions which are sponsoring research in 
mathematical statistics for contributions to the Institute and (2) an appeal to 
the members of the Institute to make additional contributions at the time of the 
payment of their annual dues. 

Other matters under consideration at the meeting included a reading and dis¬ 
cussion of a proposed revision of the By Laws, the announcement of the dates 
and locations of future meetings of the Institute and the passing of a resolution 
of thanks to those contributing to the success of the Madison meeting. 

A copy of the official minutes of this meeting may be obtained on request from 
the Secretary-Treasurer. 

P. S. Dwyer 
Secretary - Treasurer 


607 



REPORT ON THE MADISON MEETING OF THE INSTITUTE 


The Eleventh Summer Meeting of the Institute of Mathematical Statistics 
was held at the University of Wisconsin, Madison, Wisconsin, Tuesday, Sep¬ 
tember 7 through Friday, September 10, 1948. The meeting was held in con¬ 
junction with the summer meetings of the American Mathematical Society, the 
Mathematical Association of America and the Econometric Society. The follow¬ 
ing eighty members of the Institute attended the meeting: 

C. B. Allendoerfer, V. L. Anderson, K. J. Arnold, H. M. Bacon, A. S. Barr, Walter Bartky | 
H. P. Beards A. A. Bennett, T. A. Bickerstaff, J. H. Bushey, Maria Castellani, Uttam Chand’ 
Herman Chernoff, C. C. Craig, J. H. Curtiss, G. B. Dantzig, D. B. De Lury, J. L. Doob, A. M 
Dutton, P. S. Dwyer, Mrs. Daisy Edwards, Churchill Eisenhart, II. P. Evans, C. H. Fischer, 
J. E. Freund, H. M. Gehman,H. H. Germond, M. A. Girshick, Casper Goffman, P. It. Halmos, 
W. G. Hart, E. H. C. Hildebrandt, Wassily Hoeffding, D. G. Horvitz, Harold Hotelling, A. S. 
Householder, M. H. Ingraham, Leo Katz, Oscar Kempthorne, J. F. Kenney, W. M. Kincaid, 
T. C. Koopmans, H. D. Larsen, Walter Leighton, II. B. Mann, A. M. Mark, Jacob Marschak, 
A. W. Marshall, Kenneth May, M. It. Mickey, Jr., Dorothy J. Morrow, C. J. Nesbitt, M. J. 
Netzorg, John von Neumann, Jerzy Neyman, G. B. Price, C. J. Rees, J. S. Rhodes, P. R. 
Rider, F. D. Rigby, Herman Rubin, Arthur Sard, Henry Scheflte, E. D. Schell, I. E. Segal, 
G. R. Seth, W. B. Simpson, Andrew Sobczyk, E. W. Stacy, C. M. Stein, A. G. Svranson, 
Zenon Szatrowski, R. M. Thrall, A. W. Tucker, J. W. Tukey, W. A. Wallis, J. E. Walsh, 

J. E. Wilkins, Jr., S. S. Wilks, M. A. Woodbury. 

The Tuesday morning session was devoted to contributed papers. Professor 

K. J. Arnold of the University of Wisconsin presided. The attendance was 
approximately forty. The following papers were presented: 

1. On Distribution-free Confidence Intervals. Preliminary Report. 

Dr. Wassily Hoeffding, Institute of Statistics, University of North Carolina. 

2 . On Certain Statistics for Samples of S from a Normal Population. 

Mr. Julius Lieblein, Statistical Engineering Laboratory, National Bureau of Stand¬ 
ards. Presented by Dr. Churchill Eisenhart. 

3. On Multinomial Distributions with Limited Freedom: A Stochastic Genesis of Pareto's 
and Pearson's Curves. 

Professor Maria Castellani, University of Kansas City. 

4. Fitting Generalized Truncated Normal Distributions. 

Professor Harold Hotelling, Institute of Statistics, University of North Carolina. 

5. On the Distribution of the Two Closest Observations Among a Set of Three Independent 
Observations. 

Professor G. R. Seth, Statistical Laboratory, Iowa State College. 

6 . The Derivation of Certain Recurrence Formulae and their Application to the Extension 
of Existing Published Incomplete Beta Function Tables. 

Dr. T. A. Bancroft, Alabama Polytechnic Institute. (Presented by title.) 

On Tuesday afternoon a session for contributed papers was held jointly with 
the American Mathematical Society. Professor P. S. Dwyer of the University 
of Michigan presided. The attendance was approximately eighty. The follow¬ 
ing papers were presented: 



REPORT ON MADISON MEETING 


609 


7. Asymptotic Studentization in Testing Hypothesis. 

Dr. Herman Chernoff, Cowles Commission, University of Chicago. 

8 . Completeness , Similar Regions and Unbiased Estimation. Preliminary Report. 
Professor E. L. Lehman, University of California and Professor Henry SheffS, Uni¬ 
versity of California at Los Angeles. 

0. On a Proposed Method for Estimating Populations. 

Professor C. C. Craig, University of Michigan. 

10 . Some Results on the Asymptotic Distribution of Maximum- and Quasi-maximum-likeli¬ 
hood Estimates. 

Dr. Herman Rubin, Institute for Advanced Study. 

11 . The Probability Points of the Distribution of the Median in Random Samples from any 
Continuous Population. 

Dr. Churchill Eisenhart, J/ola S. Demingand Celia S. Martin, Statistical Engineering 
laboratory, National Bureau of Standards. 

12. On the Arithmetic Mean and the Median in Small Samples from the Normal and Certain 
Non-normal Populations. 

Dr. Churchill Eisenhart, Lola S. Demingand Celia S. Martin, Statistical Engineering 
Laboratory, National Bureau of Standards. 

13. The Relative Frequencies with which Certain Estimators of the Standard Deviation of a 
Normal Population Tend to Underestimate Its Value. 

Dr. Churchill Eisenhart and Celia S. Martin, Statistical Engineering Laboratory, 
National Bureau of Standards. 

11 . Some Non-parametric Tests of Whether the Largest Observations of a Set are too Large. 
Preliminary Report. 

Dr. J. E. Walsh, Project Rand, Santa Monica, California. 

15. On Some Bounded Significance Level Properties of the Equaltail Sign Test for the 
Mean. 

Dr. J. E. Walsh, Project Rand, Santa Monica, California. (Presented by title.) 

16. Infinitely Divisible Distributions. 

Professor Will Feller, Cornell University. (Presented by title.) 

17. Fluctuation Theory of Recurrent Events. 

Professor Will Feller, Cornell University. (Presented by title.) 

18. Formulae for the Percentage Points of the Distributions of the Arithmetic Mean in 
Random Samples from Certain Symmetrical Universes. 

Mr. IJttam Chand, University of North Carolina and National Bureau of Standards. 
(Presented by title.) 


Abstracts of the contributed papers appear elsewhere in this issue of the Annals. 

On Wednesday morning the Institute and the Econometric Society held a joint 
session on Stochastic Processes with Professor Harold Hotelling of the University 
of North Carolina presiding. Attendance was approximately ninety. Professor 
Hotelling presented an Historical Summary of the Problem. Professor J. L. Doob 
of the University of Illinois presented a paper, Stochastic Differences Equations 
and Stochastic Differential Equations. Professor Subrahmanyan Chandrasekhar 
of the University of Chicago presented a paper, Brownian Motion f Dynamical 
Friction and Stellar Dynamics . 

The three joint sessions of the Institute and the Econometric Society on Thurs¬ 
day were devoted to a Symposium on the Theory of Games. The maximum 
attendance was approximately three hundred. The first morning session was 
held under the chairmanship of Professor S. S. Wilks of Princeton University. 



610 


REPORT ON MADISON MEETING 


Professor John von Neumann of the Institute for Advanced Study presented a 
paper, Survey of the Theory of Games . Professor Oskar Morgenstem of Princeton 
University presented a paper, Economics and the Theory of Games . Dr. M. A. 
Girshick of Project Rand presented a paper, Statistics and the Theory of Games . 
The second morning session was under the chairmanship of Professor John von 
Neumann of the Institute for Advanced Study. Dr. E. W. Paxson of Project 
Rand presented a paper, Recent Developments . Professor J. W. Tukey of Prince¬ 
ton University presented a paper, A Problem in Strategy . Dr. G. B. Dantzig of 
the Army Air Forces presented a paper, Programming in a Linear Structure . The 
final session of the symposium was a round table discussion with Professor John 
von Neumann of the Institute for Advanced Study as chairman and with the 
following participants: Dr. G. B. Dantzig, Dr. M. A. Girshick, Professor Harold 
Hotelling, Professor Irving Kaplansky, Professor Samuel Karlin, Dr. J. C. C. 
McKinsey, Professor Oskar Morgenstem, Dr. E. W. Paxson, Dr. L. S. Shapley, 
and Professor J. W. Tukey. 

A membership business meeting was held on Friday, September 10, in Bascom 
Hall at which twenty-one members were present. An account of the business 
transacted at this meeting may be found elsewhere in this issue under the heading 
“Adoption of a New Constitution.” 

The final session was on Sequential Estimation and was held jointly with the 
Econometric Society on Friday morning with Professor Jerzy Neymen of the 
University of California presiding. Attendance was approximately fifty. Pro¬ 
fessor Charles Stein of the University of California presented a paper on Sequen¬ 
tial Estimation. Professor W. A. Wallis of the University of Chicago presented 
a discussion. 

Social affairs during the meeting included a tea Tuesday afternoon, a concert 
of the Pro Arte String Quartet Tuesday evening, a dinner Wednesday evening, 
a picnic Thursday afternoon, and a beer party Thursday evening. 

K. J. Arnold 
Assistant Secretary 



BIOMETRIKA 


A Journal for the Statistical Study of Biological Problems 


Volume XXXV Contents Parts I and II, 1948 


I. Memoir of James Fowler Tocher. (1804-1045) By Majob Gbbenwood. 

II. On some modes of population growth leading to R. A. Fisher’s logarithmic series distribution. By David 
O. Kendall. III. The etudentised form of the extreme mean square test in the analysis of varianoe. By 
K. R. Naib. IV. The estimation of non-linear parameters by Internal least squares'. By H. O. Habilet. 
V. The geometrical method in the theory of sampling. By David Foq. VI. Proofs of the distribution 
law of the second order moment statistics. By John Wishabt. VII. Tests of significance in multivariate 
analysis. By C. R. Rao. VIII. Alternative systems in the analysis of varianoe. By N. L. Johnson. IX. 
An examination and further development of a formula arising in the problem of comparing two mean values. 
By Alice A. Asfin. X. On the power function of the longest run as a test for randomness in a sequence of 
alternatives. By Q. Bat khan. XI. Sur lee courbes de frequence de K. Pearson. By M. Dumas. XII. 
The distribution of the extreme deviate from the sample mean and its etudentised form. By K. R. Naib. 
XIII. The Fisher-Yates test of significance in 2 x 2 contingency tables. By D. J. Finnet. XlV. The power 
function of the test for the difference between two proportions in a 2 x 2 table. By P. B. Fatnaik. XV. 
The analysis of contingency tables with groupings based on quantitative characters. By F. Yatks. XVI. 
The probability integral transformation when parameters are estimated from the sample. By F. N. David 
and N. L. Johnson. XVII. A table for the calculation of working probits and weights in probit analysis. 
By D. J. Finnkt and W. L. Stevens. XVIII. MteceUanea: A note on the x* smooth test. By H. L. Seal. 
Rank correlation and product-moment correlation. By P. A. P. Moban. Tests of significance in the variate 
difference method. By N. L. Johnson. XIX. Review: M. G. Kendall’s The Advanced Theory of Statis¬ 
tics, Vol. II. By B. L. Welch. 


The subscription price, payable in advance, is 45s. inland, 54s. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Offioe, Department of Statistics, 
University College, London, W.C. 1.” All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 


ECONOMETRICA 

Journal of the Econometric Society 
Contents of Vol . 16 9 No. 3, July, 1948 , include: 

Page 

Josbph A. Schumpeter: Irving Fisher’s Econometrics. 219 

AndrId Nataf: Sur la possibility de construction de certains macromodfcles... 232 
Duncan Black: The Decisions of a Committee Using a Special Majority .... 245 
Duncan Black: The Elasticity of Committee Decisions with an Altering Size 


of Majority. 262 

Published Quarterly Subscription to Nonmembers: $9.00 per year 


The Eoonometrio Society is an international society for the advancement of eoonomio theory in its 
relation to statistics and mathematics. 

Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in 
applying for membership should be addressed to Alfred Cowles, Secretary and Treasurer, The Eoon¬ 
ometrio Society, The University of Chioago, Chicago 37, Illinois, U.S.A. 







SAMPLE SESSIONS: Basic Features of a National System of Statistical Intelligence 

.Statistical Requirements for Economic Mobilization . . 

. . . Recent Advances in Mathematical Statistics.Statistics for the Clinician and 

Doctor.The Second Statistics Course .... Review of Statistical Methodology . . 

... Business Outlook for 1949 ..... Statistics of the Kinsey Report.Ten-Country 

Survey of Public Opinion.Input-Output Analysis and its use in Peace and War 

Economics.Sampling Methodology .... etc. 

entire program in THE AMERICAN STATISTICIAN, October 1048 


for further information wrtta 

THE AMERICAN STATISTICAL ASSOCIATION, 1603 K Stree tN.W., Washington 6, D.C. 
















Wiley Books 

NOMOGRAPHY 

by ALEXANDER S. LEVENS 

‘‘The writer and publishers are to be 
commended for a splendid contribution 
to the field of nomography. The illus¬ 
trations and drafting examples are well 
done. The author has a good arrange¬ 
ment of the subject matter with a clear 
mathematical proof of the graph leaving 
no room for doubting its accuracy. I 
like the wide range of engineering appli¬ 
cations as problems. 11 
—Professor T. C. Brown, University of 
North Carolina 

1948 176 pages $3.00 

Introduction to 
MATHEMATICAL 
STATISTICS 

by PAUL G. HOEL 

“In view of its general excellence, its 
publication is rather more than just a 
welcome event: it is the most important 
happening in the field of undergraduate 

statistical textbooks in several years. 

»» 

—J. H. Curtiss, National Bureau of 
Standards 

1947 258 pages $3.50 

CONTROL CHARTS 
IN FACTORY 
MANAGEMENT 

by WILLIAM B. RICE 

“Control Charts in* Factory Manage¬ 
ment is an able presentation of the 
statistics, philosophy and practical ap¬ 
plications of the control chart tool of 
quality control.” 

—A. V. Feigenbaum, General Electric 

Review 

1947 149 pages $2.50 


in Statistics 

FORECASTING 
FOR PROFIT 

A Technique of Business 
Management 
by WILSON WRIGHT 

“Mr. Wright’s book is devoted primarily 
tor the application of available data and 
knowledge to forecasting for a business 
enterprise. In this field it is a com¬ 
pletely competent and highly useful 
book. It is well organized, logical in 
content and argument and modest in its 
claims.” 

—A. G. Abramson, American Economic 
Review 

1947 173 pages $2.75 

SEQUENTIAL ANALYSIS 

by ABRAHAM WALD 

“The Sequential Analysis marks the 
greatest advance in statistical theory 
in recent years and I have no doubt that 
Wald’s book will be a great success.” 

—Professor William Feller, Cornell 
University 

“It will be a most stimulating and 
valuable book because of Wald’s elegant 
and rigorous approach ...” 

—Professor Irving W. Burr, Purdue 
U niversity 

1947 212 pages $4.00 

GOVERNMENT 
STATISTICS 
FOR BUSINESS USE 

Edited by PHILIP M. HAUSER 
and WILLIAM R. LEONARD 

“I think the book should be invaluable 
not only to businessmen but also to 
educators and statisticians. It covers 
the field well and with great clarity.” 
—Professor Irving Lobge, Columbia 
University 

1946 432 pages $5.00 


440 


JOHN WILEY & SONS, I no., 


Fourth Ave., Now York 16 




McGraw-Hill Books of Timely Importance 


SAMPLING INSPECTION 

Principles, Procedures, and Tables for Single, Double, 
and Sequential Sampling in Acceptance Inspection and Quality Control 
Based on Percent Defective 

By The Statistical Research Group, Columbia University. OSRD. 

395 pages. $5.25 ** 

A systematic presentation of the best current inspection practices, together with tables 
and detailed instructions for carrying out these practices. It includes a complete, con¬ 
sistent and workable sampling inspection program ready for installation. 

SELECTED TECHNIQUES OF STATISTICAL ANALYSIS 

For Scientific and Industrial Research and Production 
and Management Engineering 

By The Statistical Research Group, Columbia University. OSRD. 

473 pages, $6.00 

Discusses a series of problems which occur frequently in planning, analyzing or interpreting 
quantitative data, and explains various techniques appropriate to these problems. 

STATISTICAL QUALITY CONTROL 

By Eugene L. Grant, Stanford University. 

McGraw-Hill Industrial Organization and Management Series. 

563 pages, $5.00 

Deals with the laws of probability that may be used to improve acceptance procedures and 
thus secure the best possible quality assurance from a given inspection cost. Explains the 
Shewhart control chart and its use in manufacturing. 

RUDIMENTARY MATHEMATICS 
FOR ECONOMISTS AND STATISTICIANS 

By W. L. Crum and J. A. Schumpeter, Harvard University 
206 pages, $2.75 

Presents rudimentary ideas and operations essential to any effective mathematical reason¬ 
ing by economists and statisticians. Covers graphic analysis, (simplest case), curves and 
equations, limits, rates and derivatives, maxima and minima, differential equations, and 
determinants. 


Send for copies on approval 


McGRAW-HILL BOOK COMPANY, Inc. 

330 West 42nd Street New York 18, N. Y. 






1.A K I. 75 

INDIAN AGRICULTURAL RESEARCH 
INSTITUTE LIBRARY, NEW DELHI 



GIPNLK—H-40 I.A.R.I.—29-4- 5—15,000 





